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Abstract 


The phytochrome gene family encodes photoreceptor proteins that serve many functions throughout the life of a 
plant. From studies of the angiosperm Arabidopsis , the family has been modeled as comprising five loci, PH) A- 
PHYE. However, in most nonangiosperms, one locus, or at most two, is present. Moreover, it is shown here that the 
Arabidopsis model does not completely represent some angiosperm groups. For example, additional P//) loci related 
to PHYA and PIfYB of Arabidopsis have evolved independently several times in dicot angiosperms, and monocot 
angiosperms (as well as Piper) may lack orthologs of Arabidopsis PH) I) and PHYE. Nonetheless, for studies of 
organismal evolution, the phytochrome gene family is a potential source of phylogenetic information because the loci 
occur as single copy sequences, and preliminary data suggest that the various loci are evolving independently. In the 
plant family Fabaceae, phytochrome data are shown to provide phylogenetic resolution to a taxonomically very difficult 
tribe of tropical woody genera that include Millettia , Lonchocarpus , and Derris. In addition to nucleotide substitutions, 
phylogenetically informative insertions and deletions helped to resolve relationships in this group of legumes. Also, 
the presence of a legume-specific locus related to PHYA should prove to be phylogenetically informative once its 
taxonomic distribution is better understood. 


Most molecular phylogenies of plants are in¬ 
ferred from one or two genes, and these usually 
from chloroplast or nuclear ribosomal DNA se¬ 
quences. When discordance between molecular 
phylogenies occurs, biological phenomena such as 
introgressive hybridization or lineage sorting from 
polymorphic ancestry may explain the disparity 
(e.g., Harrison et al., 1987; Rieseberg & Brunsfeld, 
1992; Soltis et al., 1992). Such differences also 
may result from lack of resolution in one of the 
data sets (e.g., Olmstead, 1989), or from mistaken 
orthology (e.g., Goodman et al., 1979; Doyle, 
1992). Thus, determining organismal relationships 
requires that evolutionary hypotheses derived from 
single genes be tested with further data (e.g., Pam- 
ilo & Nei, 1988; Takahata, 1989). DNA sequences 

from the low copy fraction of the nuclear genome 
potentially provide novel phylogenetic resolution, 
specifically at the organismal level, since certain 
of the processes that lead to incongruence of spe¬ 
cies and gene trees (e.g., uniparental inheritance. 


nonhomologous recombination) may be less fre¬ 
quent. 

The low copy fraction of nuclear DNA remains 
underexplored in phylogenetic studies of plants, 
and initial investigations of DNA sequences from 
multigene families have revealed some potential 
problems related to concerted evolution (sensu Zim¬ 
mer et al., 1980). For example, an analysis of rbcS 
nucleotide sequences (Meagher et al., 1989) in¬ 
dicated that gene conversions among rforS loci have 
occurred in each genus examined, leading to regions 
of “partial homology” (Patterson, 1987) and thus 
to the possibility of mistaken orthology. Sanderson 
& Doyle (1992) suggested, however, that the prob¬ 
ability of reconstructing a reliable organismal phy- 
logeny is high from DNA sequences of multigene 
families in which concerted evolution is infrequent. 
Preliminary data indicate that this is the case in 
such gene families as actin (Shah et al., 1983; 
Drouin & Dover, 1990; McElroy et al., 1990) and 
phytochrome (Sharrock & Quail, 1989; Dehesh et 
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al., 1991; Heyer & Gatz, 1992a, b; Clack et al., 
1994; Adam et al., 1993); consequently, these 

multigene families should yield data pertinent to 
studies of organismal phylogenies. Furthermore, an 
advantage of multigene families in phylogenetic 
reconstruction is that, in addition to nucleotide 
substitution and insertion/deletion characters, the 
presence or absence of loci can be phylogenetically 
informative. 

The phytochromes are photoreceptors for red 
and far-red light in all land plants and green algae 

(reviewed in Quail, 1991; Furuya, 1993). Each 

subunit of these large cytoplasmic receptors com¬ 
prises a protein of 1100 to 1200 amino acids and 
a covalently attached linear tetrapyrrole chromo- 
phore. Existing in two continuously interconvertible 
forms, Pr, the red light-absorbing form, and Pfr, 
the far-red light-absorbing and biologically active 
form, phytochrome mediates diverse developmen¬ 
tal responses throughout the plant’s life cycle. These 
responses include germination, seedling hypoco tyl 
elongation, stem cell differentiation, plastid devel¬ 
opment, flavonoid pigment synthesis, and floral in¬ 
duction in response to photoperiod. Modulation of 
plant gene expression by phytochrome is well doc¬ 
umented (Nagy et al., 1988). While the mec ha- 
nisms whereby phytochrome participates in cellular 
signalling remain unknown, regions of the poly¬ 
peptide required for chromophore attachment, 
spectral integrity, biological activity, and dimeri¬ 
zation have been identified (Cherry et al., 1993; 
Edgerton & Jones, 1992). 

Several reports have described the presence of 
only a single PHY gene in certain nonangiosperms 

(Hanelt et al., 1992; Kolukisaoglu et al., 1993; 
Morand et al., 1993; Okamoto et al., 1993; 
Thiimmler et al., 1992; Winands et al., 1992), 

while evidence of two PH) genes is reported for 
other nonangiosperms. For example, Maucher et 
al. (1992) refer to a putative second gene in the 
fern Dryopteris filix-mas L., although the frag¬ 
ment remains uncharacterized. Two unpublished 
PHY sequence fragments from Psilotum nudum 
(L.) Griseb. (GenBank accessions X74930, 
X74931) differ from one another in the region of 
overlap; and two PHY cDNAs from Pinus palustris 
Mill, reportedly have been cloned and partially 
sequenced (Furuya, 1993), while a single PH) 
cDNA from Gingko biloba L. is cited in the same 
report. However, in angiosperms, five related se¬ 
quences encoding phytochrome proteins designated 
PHYA-PHYE have been characterized from Ar- 
abidopsis thaliana (D.C.) Schur (Sharrock & Quail, 
1989; Clack et al., 1994). The genes for these 

five phytochromes have been mapped to Arabi- 


dopsis chromosomes 1, 2, 4, and 5 (unpublished), 
and no evidence for 77/V pseudogenes was found. 
Homologs of Arabidopsis PHY A and PH) H have 
been characterized in other angiosperms (Adam et 

al., 1993; Christensen & Quail, 1989; Dehesh et 
al., 1991; Hershey et al., 1985; Heyer & Gatz, 
1992a, b; Kay et al., 1989; Sato, 1988; Sharrock 
et al., 1986). A putative pseudogene most similar 
to PHYA has been reported in Pisum (Sato, 1990), 
and a cDNA clone from Zea containing a partial 
PHY fragment has been interpreted as a pseudo¬ 
gene (Christensen & Quail, 1989). Overall, these 
studies suggest that the gene family increases in 
complexity from nonangiosperms to angiosperms. 
This suggestion is consistent with data recently 
submitted to GenBank (see Results). 

Nearly all /7/V genes that are fully character¬ 
ized share high sequence identity (App. 1) and 
structural similarity with the Arabidopsis loci (Fig. 
I). Peptide fragments from the nonangiosperms 
Psilotum (Hanelt et al., 1992), Anemia phyllitidis 
(L.) Sw., and Dryopteris filix-mas (Maucher et al., 
1992) share high sequence identity with the Ar¬ 
abidopsis phytochromes in their N-termini (App. 
1, 2), and small internal PHY peptides from the 
alga Mesotaenium caldariorium (Lagerh.) Hansg. 
are highly similar to both N- and C-terminal pep¬ 
tides of other phytochromes (Morand et al., 1993). 
Two exceptional PHY genes have been described 
in nonangiosperms. The PHY gene sequence from 
the alga Mougeotia scalaris Hassel (Winands et 
al., 1992) contains additional introns in the N-ter¬ 
minal coding sequence, and in the PHY gene from 
the moss Ceratodon purpureus (Hedw.) Brid. the 
conserved N-terminal region is combined with a 
highly divergent C-terminal coding region (Fig. 1), 
which encodes a putative light-regulated protein 
kinase (Thiimmler et al., 1992). However, in an¬ 
other moss, PhyscomitreUa patens (Hedw.) B.S.G., 
the C-terminal coding region is similar to all other 
/7/V genes (Kolukisaoglu et al., 1993). No unusual 
PHY loci have been described in angiosperms. 

The PHYA-E genes in Arabidopsis are differ¬ 
entially expressed in response to the light environ¬ 
ment (Sharrock & Quail, 1989; Somers et al., 
1991; Clack et al., 1994), and unique physiological 
functions have been assigned to two phytochrome 
proteins. Phytochrome A controls the far-red high- 
irradiance response (Nagatani et al., 1993; Parks 

& Quail, 1993; Whitelam et al., 1993), whereas 

phytochrome B controls red light regulation of stem 
length and flowering time, and the end-of-day far- 
red light response (Reed et al., 1993; Wester et 
al., 1994). This functional divergence together with 
high sequence divergence (approximately 50% 
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Figure 1 . Phytochrome gene structure of Arabidopsis (Clack et al., 1994) and Ceratodon (Thiimmler et al., 
1992), from N-terminus (left) to C-terminus (right) showing untranslated regions (lines), exons (filled rectangles), 
introns (shaded rectangles), and the approximate site of chromophore attachment (triangle). 


among the PHYA , PHYB , and PHYC loci) suggests 
that nonhomologous recombination is infrequent 
among PHY genes of Arabidopsis. If the loci are 
evolving independently, distinguishing orthologs 
from paralogs should not be difficult. To test this 
hypothesis, and to ascertain the phylogenetic utility 
of PHY sequence data, PCR (polymerase chain 
reaction) was used to sample multiple PHY loci 
from genomic DNAs of diverse species of land 
plants for sequence information, and these data 
were subjected to phylogenetic analysis. 

Materials and Methods 

Total DNA was isolated from fresh, lyophilized, 
or dried herbarium material of taxa listed in Ap¬ 
pendix 3 by standard methods (Doyle & Doyle, 
1987). Aliquots were extracted once with phenol: 
chloroform-isoamyl alcohol (1:1 volume), and the 
aqueous portions were purified over sepharose CL- 
6B (Pharmacia, Piscataway, New Jersey) columns. 
To assess phytochrome diversity in early land plants, 
DNA sequences from different nonangiosperm phy¬ 
la available in the literature (Appendix 2 and Ko- 
lukisaoglu et al., 1993) were included in the anal¬ 
yses with those determined during the present study. 
The most complete PHY sequence from Psilotum 
obtained from GenBank (accession X74931, lack¬ 
ing 510 3' nucleotide sites out of the 3417 nucle¬ 
otide sites in the full-length sequence data set) was 
used in phylogenetic analyses, but was not included 
in final alignments because it did not significantly 
affect the consensus sequence. Likewise, the PHY 
sequences from Physcomitrella and from the an- 
giosperm Nicotiana (GenBank accessions X66784, 
LI01 14), were used in phylogenetic analyses, but 
were not included in Appendix 1. DNAs were sam¬ 
pled from different subclasses of angiosperms (sen- 


su Cronquist, 1981) and, from legumes, DNAs 
were sampled to include two to three divergent 
members of the tribes Robinieae, Millettieae, and 
Dalbergieae in order to make preliminary evalua¬ 
tion of biogeographic hypotheses (e.g., Lavin & 
Luckow, 1993). The two species sampled from 
Millet tin (M. dura Dunn and M. richardiana 
(Bail!.) D. J. Du Puy & J. Labat) and Sesbania 
(S. sesban (L.) Morr. and S. vesicaria (Jacq.) Elliot) 
are not thought to be closely related within each 




k » • 


A region of the PH) gene that encodes a peptide 
including and proximal to the chromophore at¬ 
tachment site was amplified using PCR, resulting 
in a target of 270-350 bp (App. 1). Oligonucleo¬ 
tides with equimolar mixtures of nucleotide pairs 
at two-fold degenerate sites and inosines (I) at three- 
to four-fold degenerate sites were designed to am¬ 
plify all possible target sequences in template DNAs 
flanked by the conserved upstream peptide HY- 

PATDIP (5' - C A[TC]TA[TC][TC]C IGCIACIG A 
[TC]AT[TCA]CC-3') and downstream PFPLRYAC 
(5^C[AC)CAIGC[AG]TAIC[GT]IA[AG]IGG[AG] 

| ATjAlCC-3'). These peptide sequences are con¬ 
served in all Arabidopsis phytochromes and in the 
amino acid sequences inferred from other fully 
sequenced dicot and monocot genes, and they flank 
a region comprising variation likely to be phylo- 
genetically informative. Standard PCR protocols 
(Perkin-Elmer, Norwalk, Connecticut) were mod¬ 
ified to include an initial 5 cycles in which annealing 
temperatures were less stringent (e.g., 45-49°C). 
The PCR products were converted to blunt-end 
fragments with T4 DNA polymerase (BRL, Gaith¬ 
ersburg, Maryland) and were ligated to Eco RV -cut 

bacteriophage M13KRV8.2. M13KRV8.2 carries 

an Eco K cassette that facilitates screening of non- 
recombinants in an E. coli strain which is 


r k + m k 4 
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(Waye et al., 1985). Transformation of E. coli 

with the ligation product yielded a population of 
M 13 PHY clones containing amplified genomic PHY 
sequences. Individual clones were cultured, and 
double-stranded phage DNA was isolated lrom bac¬ 
terial pellets hy alkaline-lysis minipreparation. In¬ 
serts cut from M13 vectors using fccoRI and 
HindlU were resolved on 3% NuSieve (FMC, 
Rockland, Maine), or 2% standard, agarose gels, 
and in some cases were further screened hy re¬ 
striction enzyme digestion to avoid sequencing du¬ 
plicate clones. Single-stranded DNAs for Sanger 
dideoxy sequencing (Sequenase version 2.0, USB, 
Cleveland, Ohio) were isolated from recombinants 
carrying putative PII 1 inserts. In most cases, se¬ 
quences of both orientations were determined, and 
multiple PCR products from two accessions or gen¬ 
era were sequenced to detect possible contamina¬ 
tion and PCR errors. Peptide sequences were mul¬ 
tiply aligned using ALICN (Scientific & Education 
Software, State Line, Pennsylvania) and GDE 2.2 
(Steven Smith and University of Illinois) and were 
adjusted hy eye; peptide alignments were the basis 
for multiple nucleotide sequence alignments. For 
sequence comparisons, alignment gaps in certain 
regions of insertion/deletion were deleted, while 
gaps that could be identified as homologous were 
coded as single characters. Nonhomologous 3' and 
5' nucleotide sites were not included in the data 
matrices used in cladistic and distance analyses. 

Sequences were compared using maximum par¬ 
simony algorithms available in PHY LIP (Felsen- 
stein, 1993), Hennig86 (Farris, 1988), and PAUP 
(Swofford, 1993). Minimal length trees resulting 
from heuristic search options available in either 
Hennig86 (mh*, bb* with no upper limit set), PHY- 

LIP (DNAPARS), or in PAUP (CLOSEST or RAN¬ 
DOM data addition sequence, HOLD option set for 

5 trees when applicable, STEEPEST DESCENT, 
MULPARS, and TBR branch swapping options 

activated, with branch swapping on nonminimal 
trees, and MAXTREES set at 10,000) were used 
as starting trees for further PAUP analyses (CLOS¬ 
EST data addition sequence, STEEPEST DE¬ 
SCENT, MULPARS and TBR options activated, 

with branch swapping on nonminimal trees), with 
the latter resulting in shorter trees. Support lor 
monophyly of clades was evaluated using bootstrap 
resampling (Felsenstein, 1985) and decay analysis 
(Bremer, 1988). Pairwise distances were estimated 
using the Kimura 2-parameter option available in 

MEGA (Kumar et al., 1993) and absolute and 

relative evolutionary rates were calculated hy the 

methods of Kimura (1981) and Wu & Li (1985) 

respectively. All matrices subject to distance, cla¬ 


distic, and rate analyses are available on request 
from the first author. 'Free analysis and graphical 
output were performed with MacClade (Maddison 

& Maddison, 1992) and COMPONENT (Page, 

1 993). However, tree mapping procedures based 
on the model of Goodman et al. (1979), which 
evaluate whether incongruence of gene and species 
trees could be due to sampling error (Page, 1990), 
were not performed because of the preliminary 
nature of this study. 

For the cladistic analysis of the full length se¬ 
quences, trees were rooted by designating PH) 
sequences from Physcomilrella, Selaginella, and 
Adiantum capillus-veneris L. (Okamoto et al., 
1993) as the outgroups, because they are the only 
fully characterized PHY genes from nonangio- 
sperms. For analysis of partial sequences in angio- 
sperms, Selaginella was retained as an outgroup, 
along with the PHY sequences from the gymno- 
sperms Gingko and Pseudotsuga that were deter¬ 
mined during this analysis. 

In all cladistic analyses, first, second, and third 
codon positions were equally weighted for the fol¬ 
lowing reasons. First, empirically determined tran¬ 
sition/transversion ratios did not vary significan 
from 1.0 for any comparisons except lor between 
closely related legume sequences that were differ¬ 
entiated by very few total substitutions (e.g., <3% 
of all sites were variable). Second, results from 
cladistic analyses under certain differential weight¬ 
ing schemes are apparently the same as those from 
analyses under equal weighting schemes when tax¬ 
onomic sampling is adequate (Albert et al., 1993; 
Cracraft & Helm-Bychowski, 1991). Finally, all 
codon positions may exhibit similar levels of ho- 
moplasy (see Chase et al., 1993); thus a rationale 

for excluding or differentially weighting codon po¬ 
sitions is difficult to define. In these analyses, third 
codon positions, and perhaps many of the synon¬ 
ymous substitutions, were determined by bootstrap 
resampling analyses to be phylogenetically very 
informative, with confidence intervals for just the 
third codon position of between 90 and 100%, or 
at least as high as the values obtained for the first 
or second position. 



Results 

The orthology of fully sequenced PH) genes 
from various species to individual PHY loci from 
Arabidopsis has commonly been established by 
overall similarity (I)ehesh et al., 1991; Heyer & 
Catz, 1992a, b; Quail, 1991; Furuya, 1993). Sim¬ 
ilarities in gene expression and regulation have 
been used secondarily to imply orthology (furu y a ’ 
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1993). However, overall similarity may not reflect 
phylogeny, and phylogenetically related loci may 
differ in function due to mutations in cis-regulatory 

regions (e.g., Doyle, 1991; Li & Noll, 1994). Since 

orthology is best determined by shared ancestry, 
as evidenced by synapomorphies, cladistic analysis 
was used to determine the orthology of all available 
full length PHY sequences to those characterized 
from Arabidopsis. A single most parsimonious tree 
(Fig. 2) was generated in this analysis and it re¬ 
solved the following monophyletic clades with strong 
(90-100%) bootstrap support: all monocot PHY As, 

all dicot PHYAs, all PHYAs, all PHYAs + Ara¬ 
bidopsis PHYC, just PHYB and PHYD of Ara¬ 
bidopsis, just PHYBs and Arabidopsis PHYD, 
Arabidopsis PHYE + all PHYBs and Arabidopsis 
PHYD, all angiosperm PHYs, all angiosperm PHYs 
+ Psilotum, and angiosperm PHYs + Psilotum 
+ Adiantum. Seventy-eight trees were found by 
keeping all trees that were ^30 steps longer than 
the most parsimonious one; all clades were retained 
in all trees that are 20 steps longer, except for 

Arabidopsis PHYC + all PHYAs. The two trees 
that were one step longer than the minimal length 
tree varied in their placement of PHYC as the 
sister group of either the PHY A or PHYB/D/E 
clade. These results thus suggest that, for example, 
the dicot and monocot PHYAs are orthologous, as 
are the dicot and rice PHYBs. Additionally, evi¬ 
dence is provided for the sister group relationship 

of PHYE with PHYB + PHYD, and for a later 

duplication giving rise to Arabidopsis PHYB or 

PHYD. 

Using degenerate primers and amplification by 
PCR, target sequences from all five Arabidopsis 
genes, as well as from multiple PHY genes of other 
angiosperms, were recovered in single cloning ex¬ 
periments. Single PHY sequences were obtained 
from the nonangiosperms Equisetum and Pseu¬ 
do tsuga and two were obtained from Gingko. In¬ 
serts varied from 270 to 350 bp, and a region of 

insertion and deletion corresponding to residues 
398 to 415 (App. 1) was eliminated from broad 
comparisons because nucleotide site homologies 
could not be determined. However, this region could 
be retained in narrower comparisons, where site 
homologies were more readily established, as in the 
Fabaceae data set (App. 4). 

Similarly to the analysis of full-length sequences, 
angiosperm sequences determined in this study were 
cladistically analyzed to determine their orthology 

to the PHY loci of Arabidopsis (Figs. 3-5). Each 
sequence occurred in a monophyletic clade that 
included a single, specific PHY locus of Arabi- 
dopsis , providing evidence for distinct PHY sub¬ 


families. Retention of a clade in a strict consensus 
tree (Figs. 2-5), resulting from the mhennig and 
branch-and-bound search options in Hennig86 or 
from heuristic options available in PA UP (see 
above), was considered good evidence of mono- 
phyly. Results from bootstrap resampling and de¬ 
cay analyses revealed that some clades were strong¬ 
ly supported (>95%, d > 5-20). 

The Arabidopsis PHY A sequence was included 
in a distinct monophyletic lineage in the dicot clado- 
gram (Fig. 4). In the phylogenetic analysis of mono¬ 
cot sequences (Fig. 3), monocot orthologs of PHYA 
(Fig. 2) were substituted for Arabidopsis PHYA. 
Likewise, Arabidopsis PHYA was replaced by Pi- 
sum PHYA in the analysis of legume sequences 
(Fig. 5), also based on results depicted in Figure 
2. A notable finding was that from three plant taxa, 
Ceratophyllaceae, Caryophyllaceae, and Fabaceae, 
two different PCR products were amplified that 
were determined to be most closely related to Ar¬ 
abidopsis PHYA. These are interpreted to be du¬ 
plicated PHYA loci, and in legumes, the additional 
locus is here designated PHYA ' (Fig. 5). These 
additional PHYA -related sequences appear to have 
arisen independently in the three plant groups (Figs. 
4, 5). For example, the legume phytochrome phy¬ 
logeny (Fig. 5) depicts this monophyletic PHYA’ 
clade as being derived from within the legume 
PHYA lineage (which is thus paraphyletic). Also, 
it is well supported by a bootstrap value of 95%, 
and, in a global analysis of legume PHYA ' with all 
other angiosperm loci, it is most closely related to 
legume PHYA (cladogram not shown). It thus ap¬ 
pears that the evolution of the phytochrome gene 
family in the Fabaceae has involved the duplication 
of the PHYA locus. A similar argument can be 
made for the duplicated PHYA genes in Cerato¬ 
phyllaceae and Caryophyllaceae (Fig. 4). In the 
PHYA subfamily, and in other cases described be¬ 
low, this pattern of diversification is attributed to 
the evolution of a new locus rather than to allelic 
diversity. With the exception of genes that are 
under frequency-dependent selection, such as al¬ 
leles of the S-locus (Ioerger et al., 1990) and MHC- 
loci (Klein et al., 1993), levels of divergence among 
alleles at most loci are much lower (e.g., Gaut & 

Clegg, 1993; Thomas et al., 1993) than those 
observed among PHYA and the duplicated PHYA 
loci. 

Sequences homologous to Arabidopsis PHYC 
were amplified commonly in monocots (Fig. 3). In 
dicots, only DNA of Dianihus yielded a sequence 
homologous to Arabidopsis PHYC. The homologs 
of PHYC in monocots were identified by their close 
relationship with just Arabidopsis PHYC in a global 
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Physcomitrella 


Selaginella 


Adiantum 


Psilotum 


PHYD Arabidopsis 


PHYB Nicotiana 


PHYB Solanum 


PHYB Oryza 


PHYE Arabidopsis 


PHYA Avena 


PHYA Oryza 


PHYA Zea 


PHYA Arabidopsis 


PHYA Pisum 


PHYA Nicotiana 


PHYA Solanum 


PHYACucurbita 


PHYC Arabidopsis 


Figure 2. Single most parsimonious tree from analysis of 2637 variable nucleotide sites from the full-length 
phytochrome sequences. The length is 11,376, the (.1 = 0.459, and the RI = 0.502. Bootstrap values (from 500 
replications) and decay indices are included on the best supported clades. The Nicotiana sequences were obtained 

from GenBank accessions X66784 and L10114. 
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Figure 3. Single most parsimonious tree from analysis of all monocot sequence data, which comprised 169 
informative sites. The length is 799, the Cl = 0.44, and the RI = 0.52. Bootstrap values (from 500 replications) 
and decay indices are included on the best supported clades. Single uppercase letters to the right of the generic names 
are the names of the homologous Arabidopsis PH) loci. 
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FIGURE 4. Strict consensus of four most parsimonious trees from analysis of all dicot sequence data, which 
comprised 172 informative sites. The length is 1743, the Cl = 0.23, and the RI = 0.49. Bootstrap values (from 
500 replications) and decay indices are included on the best supported clades. Single uppercase letters to the right 
of the generic names are the names of the homologous Arabidopsis PHY loci. 


analysis (cladogram not shown). The PHYC ho¬ 
molog in Dianthus was identified by its sister group 
relationship with Arabidopsis PHYC, (Fig. 4). 

Sequences homologous to Arabidopsis PHYE 
were not amplified in monocots using the primer 
set described above. However, such homologs were 


commonly amplified in dicots, and the homology 
of these sequences to PH YE was readily established 
by the inclusion of Arabidopsis PHYE in mono- 
phyletic gene lineages (e.g.. Fig. 4). Although the 
Arabidopsis PETfE sequence was not included in 
the legume data set (Fig. 5), two representative 
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Dalbergia I 

Tipuana I 

Xeroderris I 

Caragana I 

Wisteria I 

Pisum I 

Lathyrus I 

Clianthus I 

Myrospermum I 

Hebestigma I 

Lennea I 

Dalbergia I 

Dalbergiella I 

Kunstleria I 

Xeroderris I ^ 

Lonchocarpus I 

Millettia I 

Derris I 

Millettia I 

Piscidia I 

Wisteria I 


Figure 5. Strict consensus of 6500 minimal length trees generated from an mhennig* and branch and bound* 
search option on the 174 informative sites of the Fabaceae data set. Length = 545, Cl = 0.534, and RI = 0.841. 
Bootstrap values (from 1000 replications) and decay indices are included on the best supported clades. Single uppercase 
letters to the right of the generic names represent the orthologs of the Arabidopsis PHY loci. 
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legumes were included in the dicot analysis shown 
in Figure 4, and these were part of the monophy- 
letic gene lineage that included Arabidopsis PH) E. 
In the legume gene phylogeny, the bootstrap value 
for the P//V7'v’clade was 100%, thus revealing how 
strongly this lineage is supported by the data in 
narrow comparisons at the taxonomic level of the 
family. 

The evolution of genes related to Arabidopsis 
PHYB has been more complex, with the apparently 
independent duplication and divergence of PH) B- 
related genes in some dicot lineages, but perhaps 
not in monocot lineages (Figs. 3, 4). The notable 
pattern here is that the Arabidopsis PHYB and 
PHYD sequences are sister groups in comparisons 
including dicots (Figs. 2, 4), and together with the 
sequence from Myrospermuni are the sister group 
of the other PHYB/PHYD- related sequem :es. Note 
that two P//VW//Trelated sequences occur in Ly- 
copersicon, forming a monophyletic clade, with a 
PHYB- related sequence from Solarium , that is sep¬ 
arate from the clade containing Arabidopsis PHYB 
and PHYD; two of the PHYB/ /^-related sequences 
from Daucus also form a monophyletic clade (Fig. 
4). This pattern could result from nonhomologous 
recombination between loci, but the hypothesis of 
recent divergence is consistent with the putative 
absence of additional PH)B- like sequences from 
monocots. Additionally, PHYD in Arabidopsis is 
apparently functionally distinct, as evidenced by 
its failure to compensate for the loss of PH) B 
function in phyB null mutants of Arabidopsis (Reed 

et al., 1993; Wester et al., 1994). 

In the two trees with dicots (e.g.. Figs. 2, 4), 

PHYB is the sister group to the PHYB/PHYD 
clade. Since PHYD and PH YE ha ve not been am¬ 
plified from monocots, the diversification of this 
part of the phytochrome gene family may have 
taken place only during the diversification of dicots. 
Further sampling from Nymphaeales, Piperales, 
Winterales, Laurales, and Magnoliales should ad¬ 
dress the question of whether the presence of just 

PHYA, PHYB, and PHYC is the ancestral con¬ 
dition in angiosperms. Notably, however, prelimi¬ 
nary analysis of three sequences from Piper re¬ 
cently submitted to GenBank (Kolukisaoglu et al., 
unpublished), derived using a different primer pair, 
suggests that they are orthologs of Arabidopsis 

PHYA, PHYB, and PHYC. Alternatively, the in¬ 
ability to amplify PH) D and PH) E from monocots 
(and Piper ) could mean that the oligonucleotide 
primers designed in recent studies do not recognize 

and amplify all PHYD and PHYE homologs. This 

alternative explanation should be evaluated in sub¬ 
sequent studies of the phytochrorne gene family in 


monocots and in magnoliids with uniaperturate pol¬ 
len. 

The relationships among the angiosperm and 
nonangiosperm PHY lineages were evaluated in 
two additional types of analysis: (1) parsimony anal¬ 
yses of nucleotide sites homologous to the PCR 
target fragment, including all angiosperm PH) A- 
E paralogs and all nonangiosperm PH\ sequences 
for which there were corresponding nucleotide data 
(about 330 bp); and (2) parsimony and distance 
analyses of amino acid sites homologous to the 
Mougeotia fragment (App. 2, about 300 amino 
acids), including from angiosperms shown in Ap¬ 
pendix 1, and from gymnosperms determined in 
this study (with sites coded as missing). Patterns 
that emerged from the nucleotide sequence anal¬ 
yses included: (1) Ceratodon, Physcomitrella, Se- 
I agin el la, Equisetum, Gingko, and Pseudotsuga 
most commonly occurred as sister groups of a 
PHYB/D/E clade; (2) Mougeotia, when not des¬ 
ignated as the outgroup, was the sister lineage of 
the PHYC clade; (3) PHYCs were basal and para- 
phyletic in many cladograms rooted at Mougeotia; 
in others, or if Mougeotia was removed from anal¬ 
yses, a PHYB/D + PHYB clade and a PHYA + 
PHYC clade were most often resolved. Results of 
the amino acid sequence analyses indicate a major 

split between PHYA + PHYC and the PHYB/ 

D/E clade, each with a set of nonangiosperms as 
sister group. The only common element of the two 
sets of analyses was the close relationship between 
the sequences available from Gingko and Pseu¬ 
dotsuga and the PHYB/D/E clade. Further, the 

robustness to perturbation of the data, which is 
found in the analysis of the full-length sequence 
data set (Fig. 2), is lost in these broad comparisons 
when the number of sites is limited. 

Recently, single phytochrome sequence frag¬ 
ments (561-654 bp) from a number of nonangio¬ 
sperms, including Gnetuni and Ephedra, were de¬ 
posited in GenBank (Kolukisaoglu et al., 
unpublished), bringing the total number of non¬ 
angiosperm homologous P//y sequence fragments 
available for nucleotide analysis to 15. Preliminary 
analyses of these sequences indicate that the data 
are still too fragmentary to draw conclusions re¬ 
garding evolution of specific loci, especially as some 
of the nonangiosperm taxa represented by a single 
sequence are likely to have more than one PHY 
gene. Furthermore, organismal relationships de¬ 
picted in these cladograms and neighbor-joining 
trees, except for the pairs Ceratodon + Funaria 
(both mosses) and Metasequoia + Pieea (both 
conifers), are not well supported in bootstrap anal¬ 
ysis. 
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Discussion 

PHYTOCHROME EVOLUTION 

The evolutionary pattern that emerges from 
phytochrome gene studies is that PHY gene di¬ 
versity appears to be limited in nonangiosperms, 
where often a single gene is found, while diversity 
is much greater in angiosperms, where orthologs 

of the PHYA, PHYB, PHYC, PH YD, and PHYE 

genes discovered in Arabidopsis are present (f igs. 
2-4). The data suggest that divergence of at least 
two, and most likely three, of the loci found in 
angiosperms preceded the diversification of How- 
ering plants. For example, orthologs of Arabidop¬ 
sis PHYA , PHYB/D y and PHYC have been de¬ 
tected in most angiosperm subclasses, and there is 
evidence for two loci in some nonangiosperm groups. 
Moreover, the model of a five member phyto- 
chrome gene family developed for Arabidopsis is 
probably not completely appropriate for all angio¬ 
sperms. For example, though the PCR primers 
developed in this study annealed to and amplified 
dicot orthologs of the Arabidopsis PHYA , PHYB/ 
/), PHYC„ and PHY/E, they annealed and amplified 
only three paralogs in monocots, PHYA , PHYB/ 
O, and PHY C. The same primers applied to DNAs 
from the Fabaceae most commonly amplified PHYA , 
PHYA ', and PHYE ; rarely did they amplify 
PHYB/D homologs, and they have yet to amplify 
PHYC. It is very possible that sequence divergence 
at the primer sites precludes the amplification of 
all loci present in some genomes, or that bias toward 
certain gene family members has occurred during 
amplification cycles; i.e., PCR selection or drift 
(sensu Wagner et al., 1994) has occurred. How¬ 
ever, preliminary results indicate that the same loci 
are obtained from genera in Fabaceae when prim¬ 
ers differing in GC content are used (Lavin, un¬ 
published); likewise, certain variations of initial am¬ 
plification conditions have not altered the set of 
loci detected in other angiosperms (Mathews, un¬ 
published). Thus, it is likely that all five genes 
characterized from Arabidopsis did not precede 
the early diversification of angiosperms. Indeed, 
data presented here showing independent evolution 
of multiple PHYB/D- related sequences in Ara - 
bidopsis , Lycopersicon , and Daucus indicate that 
the divergence of the PHYB and PH) I) loci in 
Arabidopsis occurred sometime well after the di¬ 
versification of dilleniid families. Recent diversifi¬ 
cation of the phytochrome gene family in angio¬ 
sperms is also suggested by the occurrence of 
P/ZV/f-related sequences that have independently 
evolved in Ceratophyllaceae, Caryophyllaceae, and 
Fabaceae (see F ig. 5). 


TEMPO OF SEQUENCE EVOLUTION 

Using the 2-parameter model of Kimura (1981) 
to estimate distances among all pairs of full-length 
coding sequences, and a divergence time for Se- 
laginella of .‘100 million years (Ma) ago (Townrow, 
1968), the estimated overall rate of evolution of 
PHY lineages is 0.9 to 1.5 x 10 -g substitutions 
per site per year, or about ten times as fast as 
rbch ((’base et al., 1993). In contrast, the rate of 
Jukes-Cantor corrected synonymous substitutions 
(K s ) among l*HY sequences from pooid and pan- 
icoid grasses, with an estimated divergence time of 
50 Ma (Doebley et al., 1990), and among tropical 
woody tril>es of Fabaceae, with an estimated di¬ 
vergence time of 40 Ma (Herendeen, 1992; Whee¬ 
ler & Baas, 1992) is four to five times as fast as 
rbch (Zurawski et al., 1984; Doebley et al., 1990), 
or alnnit 3.7 to 6.1 x 10~ g substitutions per site 
per year. Rates of Jukes-Cantor corrected non- 
synonymous substitutions (K A ) estimated from pair¬ 
wise comparisons with Selaginella for different 
portions of full-length phytochrome molecules (App. 
1) indicate that the 594 bp including and proximal 
to the chromophore attachment site is the most 
conserved portion of the molecule (K A = 3.2 to 
4.6 x 10 10 subst./site/year), followed by the 
2400 bp encoding the N-terminus (K A = 4.0 to 
5.4 x 10” 10 subst./site/year), followed by 3384 

bp comprising nearly the complete coding region 
(K a = 4.3 to 6.2 x 10 10 subst./site/year). It is 
notable that K s is consistently greater than K A , 
even among the most closely related PH) loci 
(e.g., Arabidopsis PHYB and PZ/l/J, and Faba¬ 
ceae PHYA and PHYA '). The opposite pattern of 
substitution among codons associated with func¬ 
tional divergence has been used to suggest recent 
positive selection for divergent function among al¬ 
leles (Nei & Hughes, 1991) and closely related 
loci (Ngai et al., 1993). However, the PHY ^ loci 
might not be amenable to this comparison because 
of their more ancient divergence. F urthermore, the 
test cannot be precisely applied without more spe¬ 
cific knowledge about codons associated with di¬ 
vergent functions. 

In 42 relative rate tests (Wu & Li, 1985) used 

to evaluate the hypothesis that rates within and 
among the PHY loci are clocklike, 1 1 rate differ¬ 
ences were significantly different (P < 0.05 or 
0.01), given a model of rate constancy. All of these 
significant differences were among, rather than 
within, PHY lineages (Appendix 5), and are thus 
unlikely to be the source of spurious long-branch 
attractions in organismal phylogenies (Hendy & 

Penny, 1989). 
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IMPLICATIONS FOB ORGANISMAL PHYLOGENETIC 
ANALYSIS 

Phytochrome sequence data is providing a high 
degree of phylogenetic resolution within the plant 
family Fabaceae, and this suggests that the phy¬ 
tochrome gene family, at the least, should be a 
promising source of data below the familial taxo¬ 
nomic level. Among other sorts of promising tax¬ 
onomic characters is the presence of a novel legume 
locus related to PHYA (here referred to as PHYA '), 
which should eventually serve as a phylogenetic 
marker for a major subgroup of Fabaceae, or pos¬ 
sibly among related families, once its taxonomic 
distribution becomes better known. One example 
of the significant phylogenetic implications that 
ha ve been revealed so far is outlined below. 

The phylogenetic relationships of the tropical 
woody papilionoid legume genera Millet-tin , Lon - 
chocarpus , Derris , and putative close relatives of 
the tribe Millettieae remain poorly resolved (Polhill, 
1994), despite recent comprehensive taxonomic 
studies (e.g., Evans et al., 1985; Geesink, 1981, 
1984). Millettia is traditionally characterized only 
by its elastically dehiscent legume (Dunn, 1912), 
but the paraphyletic (and perhaps polyphyletic) 
nature of the genus has recently been confirmed 
by chloroplast DNA data (Liston, 1992). Loncho - 
carpus and Derris have indehiscent legumes; the 
former is traditionally distinguished by its wingless 
pods and a staminal tube with basal fenestrae, 
whereas Derris is traditionally characterized by 
winged pods and staminal tube lacking basal fe¬ 
nestrae (Geesink, 1981, 1984). However, these 

traditional characterizations have recently been 
disputed (Sousa & de Sousa, 1981; Sousa & Del¬ 
gado, 1993). They argue that Lonchocarpus and 
Derris and relatives should be excluded from a 
close relationship with Millettia and allies, and 
placed closer to the genera of the tribe Dalbergieae, 
because of their indehiscent pods and putative in¬ 
determinate inflorescences. They also consider Mil¬ 
lettia and close relatives to be part of the tribe 
Robinieae. In contrast, Polhill (1971, 1981) placed 
Millettia , Lonchocarpus , Derris , and close rela¬ 
tives together as a tribe separate from Dalbergieae 
and Robinieae (Polhill, 1981), because the three 
lineages have a similar phytochemistry and inflo¬ 
rescence structure (e.g., the pseudoracemose in¬ 
florescence). 

Phyt ochrome sequence data from PHYA , 
PHYA’, and PHYK in these tropical woody papi¬ 
lionoid genera show much promise in providing at 
least some phylogenetic resolution to this group. 
The representatives of Millettia , Lonchocarpus , 


Derris , and certain allied genera (e.g., Piscidia) 
used in this analysis are consistently monophyletic 
in all minimal-length trees and in all three gene 
phylogenies (Fig. 5). Bootstrap confidence intervals 
above 90% in each individual gene phylogeny, and 
an amino acid deletion at position 405 (App. 4) in 
the PHYA ' sequence, further support the mono- 
phyly of these genera. The phytochrome data sug¬ 
gest that this group is distinct from Dalbergieae 
(represented by Dalbergia and Tipuana ), Robi¬ 
nieae (represented by Seshania , Hebestigma y Hy- 
boserna , and Lennea ), and certain other genera of 
Millettieae (e.g., Kunstleria and Dalbergiella). 
Such a grouping of Millettia , Lonchocarpus , Der¬ 
ris , and Piscidia (and presumably certain other 
genera when sampled) is consistent with chloroplast 
DNA data (Lavin, unpublished; see also Doyle & 
Doyle, 1993) and certain morphological data (Pol¬ 
hill, 1971). For example, this generic group is 
distinguished from other genera in the same tribe 
(such as Kunstleria and Dalbergiella ), as well as 
the tribes Dalbergieae and Robinieae, by an inflo¬ 
rescence in which the flowers are fascicled along 
the raceme rachis, and by flowers in which the 
standard petals have claws that are abruptly con¬ 
tracted and subtended by calluses and indexed au¬ 
ricles. This grouping is not consistent with whether 
the pods are dehiscent or not, or what type of 
nonprotein amino acid is accumulated in seed. That 
three different phytochrome loci, which are pre¬ 
sumably under different evolutionary constraints, 
all reveal this same monophyletic group suggests 
that phytochrome sequence data will have a bear¬ 
ing on revealing those morphological characters 
that may best serve as phylogenetic markers in 
this taxonomically complex group of papilionoid 
legumes. 

FUTURE DIRECTIONS 

Phytochrome DNA sequence data, readily ob¬ 
tainable using PCR, are shown here to be infor¬ 
mative regarding questions of organismal phylog¬ 
eny in narrow comparisons, such as among closely 
related genera. However, the degree of resolution 
depicted in Figure 2 is promising for their use (if 
more nucleotide sites are included) in broader com¬ 
parisons as well; notably, the branching order (ex¬ 
cept for the placement of Psilotum) is consistent 
with current hypotheses of plant phylogeny (sum¬ 
marized in Donoghue, 1994). Further, equally 
promising is the potential to use composite trees 
inferred from pairs of phytochrome loci that di¬ 
verged prior to the diversification of angiosperrns 
to determine evolutionary relationships among the 
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major angiosperm lineages in the manner Iwabe et 
al. (1989) inferred relationships among archae- 
bacteria, eubacteria, and eukaryotes. 

The data presented also raise intriguing ques¬ 
tions concerning the evolution of individual phy¬ 
tochrome loci. For example, do monocots and a 
certain subgroup of magnoliids with uniaperturate 

pollen have only PHY A , PH) H , and PHYC , where¬ 
as in eudicots and another subgroup of magnoliids, 
diversification of the phytochrome gene family is 
much greater? If so, the Arabidopsis model is not 
completely applicable to monocots. As with the 
PHY/V locus in Fabaceae, the taxonomic distri¬ 
bution of PHY genes in monocots should provide 
phylogenetic insight into the divergence of mono¬ 
cots from dicots. Additionally, further phytochrome 
data, especially from nonangiosperms, potentially 
will reveal the history of phytochrome gene dupli¬ 
cation events in the context of green plant phy- 
logeny. 

Exploration of such questions may be facilitated 
by a variety of tools; for example, preliminary data 
indicate that development of locus-specific PCR 
primers will be productive. So far, exclusively 
P//V7J-related sequences have been determined 
from Arabidopsis , Daucus , Quercus , and Spi¬ 
rt acia using a 3' PHYB/D/ /^-specific primer in 
combination with the conserved 5' primer. 

Sequences determined in this study from taxa 
other than Fabaceae are available from GenBank 
under accession numbers 1108142-8184. 
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10 20 30 40 

★ * it * 

Selaginella ( am) 1 ---- 

Ceratodon (cp) ---- 

Adiantum (ac) ---- 

Arabidopsis PHYA (atA ) 4 ----MSG 

Cucurbita PHYA (cpA) ----MST 

Piaum PHYA (psA) ----MST 

Solanum PHYA ( stA) ---MSTSLF ASDSDQLMSS 

Avena PHYA ( asA) ’ ----MSS 

Oryza PHYA (osA) ' ----MSS 

Zea PHYA ( zmA ) 1 ----MSS 

Arabidopsis PHYB (atB ) 4 —MVSGVGGS GGGRGGGRGG EEEPSSSHTP NNRRGGEQAQ 
Arabidopsis PHYD (atD)’^MVSGGGSKTS GGEAASSGHR RSRHTSAAEQ AQSSANKALR 
Arabidopsis PHYE (atE ) 11 ---- 

Solanum PHYB (stB) - - -MAS GSRTKHSHHS 

Oryza PHYB (osB) MASGSRATPT RSPSSARPAA PRHQHHHSQS SGGSTSRAGG 

Arabidopsis PHYC (atC ) 4 ---- 

ANG ---- 

CON ---- 


50 

* 

MSTTKLTYSS 
MSATKKTYSS 
MSSTRHSYSS 
SRPT—QSSE 
SRPS—QSSS 
TRPS—QSSN 
SRPS—QSST 
SRPA—SSSS 
SRPTQCSSSS 
SRPAHSSSSS 
SSGTKSLRPR 
WQNQQPQNHG 
MGFESSSSAA 
SSQAQSSGTS 
GGGGGGGGGG 
—MSSNTSRS 






100 

* 


110 

* 


120 

★ 


am GSSAKSKHSV 
cp TTSAKSKHSV 
ac GGSGKSKHGR 
atA GSRRSRHSAR 
cpA NSGRSRHSTR 
psA NSGRSRNSAR 
StA TSSRSKHSAR 
asA SRNRQSSQAR 
osA SRTRWSSRAR 
zmA SRTRQSSRAR 
atB SNTESMSKSK 
atD GGTESTNKNK 
atE SNMKPQPQKS 
StB NVNYKDSISK 
osB GAAAAESVSK 
atC CSTRSRQNSR 

ANG- 

CON- 


RVAQTTADAK 
RVAQTTADAA 
RIAQTSANAK 
IIAQTTVDAK 
IIAQTSVDAN 
IIAQTTVDAK 
IIAQTSIDAK 
VLAQTTLDAE 
ILAQTTLDAE 
ILAQTTLDAE 
AIQQYTVDAR 
AIQQYTVDAR 
NTAQYSVDAA 
AIAQYTADAR 
AVAQYTLDAR 
V S SQVLVDAK 

-Q-DA- 

-Q-A- 


LHAVYEESGE 

LEAVYEMSGD 

LYAAYEESSE 

LHADFE-E 

VQADFE-E 

LHATFE-E 

LHADFE-E 

LNAEYE-E 

LNAEYE-E 

LNAEYE-E 

LHAVFEQSGE 
LHAVFEQSGE 
LFADFAQSIY 
LHAVFEQSGE 
LHAVFEQSGA 
LHGNFE-E 


SGDSFDYSKS 
SGDSFDYSKS 
SGS-FDYSQS 
SGSSFDYSTS 
SGNSFDYSSS 
SGSSFDYSSS 
SGDSFDYSSS 
SGDSFDYSKL 
YGDSFDYSKL 
SGDSFDYSKL 
SGKSFDYSQS 
SGKSFDYSQS 
TGKSFNYSKS 
SGKFFDYSQS 
SGRSFDYTQS 
SERLFDYSAS 
-F-Y- 

-F-Y- 


INATKSTGET 
VGQSAE—SV 
VSAGKEGI— 
VRVTGPW— 
VRVTSDVS— 
VRVSGSVD— 
VRVTNVAE— 
VEAQRDGP— 
VEAQRTTG— 
VEAQRSTP— 
LKTTTYGSSV 
LKTAPYDSSV 
VISPPN—HV 
VKTTTQ—SV 
LRASPT—PS 
INLNM-PS 


IPAQ-AV 

P-AGAV 

-SSQLV 

ENQPPRSDKV 

GDQQPRSDKV 

GDQQPRSNKV 

GEQRPKSDKV 

PVQQGRSEKV 

PEQQARSEKV 

PEQQGRSGKV 

PEQQ- 

PEQQ- 

PDEH- 

PERQ- 

SEQQ- 

SSCEIPSSAV 


-TAYLQRMQR 

-TAYLQRMQR 

-TAYLQRMQR 

TTTYLHHIQK 

TTAYLHHIQK 

TTAYLNHIQR 

TTAYLHQIQK 

-IAYLQHIQK 

-IAYLHHIQR 

-IAYLQHIQR 

ITAYLSRIQR 

ITAYLSRIQR 

ITAYLSNIQR 

ITAYLTKIQR 

IAAYLSRIQR 

-STYLQKIQR 

-YL—IQ- 

-YL-Q- 


130 

* 

sm GGLVQPFGCM 
cp EGLIQNFGCM 
ac GGLVQQFGCL 
atA GKLIQPFGCL 
cpA GKLIQPFGCL 
psA GKQIQPFGCL 
stA GKFIQPFGCL 
asA GKLIQTFGCL 
osA AKLIQPFGCL 
zmA GKLIQPFGCL 
atB GGYIQPFGCM 
atD GGYTQPFGCL 
atE GGLVQPFGCL 
StB GGHIQPFGCM 
OSB GGHIQPFGCT 
atC GMLIQPFGCL 

ANG-Q-FGC- 

CON-Q-FGC- 


140 

* 

LAV-EEGSFR 
VAV-EEPNFC 
IAV-EEETFR 
LAL-DEKTFK 
LAL-DDKTFK 
LAL-DEKTCK 
LAL-DEKTLK 
LAL-DEKSFN 
LAL-DEKTFN 
LAL-DEKSFR 
IAV—DESSFR 
IAV-EESTFT 
IAV-EEPSFR 
IAV-DEASFR 
LAVADDSSFR 
IW-DEKNLK 


150 

* 

VIAFSDNAGE 
VIAYSENASE 
VLHMCE-APE 
VIAYSENASE 
VIAYSENAPE 
WAYSENAPE 
VIAFSENAPE 
VIAFSENAPE 
VIALSENAPE 
VIAFSENAPE 
IIGYSENARE 
IIGYSENARE 
ILGLSDNSSD 
VIAYSENACE 
LLAYSENTAD 
VIAFSENTQE 
-S-N- 


160 

* 

MLDLMP-QSV 

FLDLIP-QAV 

MLDVAT-QAV 

LLTMAS-HAV 

MLTMVS-HAV 

MLTMVS-HAV 

MLTMVS-HAV 

MLTTVS-HAV 

MLTTVS-HAV 

MLTTVS-HAV 

MLGIMP-QSV 

MLGLMS-QSV 

FLGLLSLPST 

MLSLTP-QSV 

LLDLSPHHSV 

MLGLIP-HTV 

-L- 

-L- 


170 

* 

PSL-GSGQQD 
PSM-GEM—D 
PTM-GQY—S 
PSV-GEH—P 
PSM-GDY—P 
PSV-GDH—P 
PSV-GEH—P 

PSVD-DPP 

PSVD-DPP 

PNVD-DPP 

PTLE-KPE 

PSIE-D-KSE 

SHS-GEFDKV 

PSLE-KCE 

PSLDSSAVPP 
PSME-QRE 


180 

* 

VLTIGTDART 

VLGIGTDIRT 

RLCIGADVRT 

VLGIGTDIRS 

VLGIGTDVRT 

ALGIGTDIRT 

VLGIGIDIRT 

RLGIGTNVRS 

KLRIGTNVRS 

KLGIGTNVRS 

ILAMGTDVRS 

VLTIGTDLRS 

KGLIGIDART 

ILTIGTDVRT 

PVSLGADARL 

ALTIGTDVKS 

-G- 

-G- 


190 

* 

LFTAAAS-AL 

LFTPSSSAAL 

LLSPASASAL 

LFTAPSASAL 

IFTAPSASAL 

VFTAPSASAL 

IFTGPSGAAL 

LFSDQGATAL 

LFTDPGTTAL 

LFTDPGATAL 

LFTSSSSILL 

LFKSSSYLLL 

LFTPSSGASL 

LFTPSSSVLL 

LFAPSSAVLL 

LFLSPGCSAL 

-F-L 

-L 


APPENDIX 1. All available full-length phytochrome amino acid sequences and 776 residues from Ceratodon. 
' Hanelt et al. (1992); 2 Thummler et al. (1992); ’Okamoto et al. (1993); 4 Sharrock & Quail (1989); s Sharrock et al. 
(1986); ft Sato (1988); 7 Heyer & Gatz (1992a); "Hershey et al. (1985); ’Kay et al. (1989); ‘"Christensen & Quail 
(1989); "Clack et al. (1994); "Heyer & Gatz (1992b); "Dehesh et al. (1991). The triangle denotes the chromophore 
attachment site. The sequences amplified in this study correspond to residues 329-431. 
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200 210 220 230 240 250 260 
******* 

sm EKAAGAVDLS MLNPIWVQSK TSAKPFYAIV HRIDVGLVMD LEPVKASDTR VGSAAGALQS HKLAAKAISR 
cp EKAAATQDIS LLNPITVHCR RSGKPLYAIA HRIDIGIVID FEAVKMIDVP VSAAAGALQS HKLAARAITR 
ac DRVIGWDVS MFNPITVQSR SSGKPFYAIL HRNDVGLVID LEPIRPDDAS I-TG-GALQS HKLAAKAIAR 
atA QKALGFGDVS LLNPILVHCR TSAKPFYAII HRVTGSIIID FEPVKPYEVP M-TAAGALQS YKLAAKAITR 
cpA LKALGFGEVT LLNPILVHCR TSGKPFYAIV HRVTGSLIID FEPVKPYEGP V-TAAGALQS YKLAAKAITR 
psA QKALGFAEVS LLNPILVHCR TSGKPFYAII HRVTGSLIID FEPVKPYEVP M-TAAGALQS YKLAAKAITR 
StA QKALGFGEVS LLNPVLVHCK NSGKPFYAIV HRVTGSLIID FEPVKPYEVP M-TAAGALQS YKLAAKAITR 
asA HKALGFADVS LLNPILVQCK TSGKPFYAIV HRATGCLWD FEPVKPTEFP A-TAAGALQS YKLAAKAISK 
OSA QKALGFADVS LLNPILVQCK TSGKPFYAIV HRATGCLWD FEPVKPTEFP A-TAAGALQS YKLAAKAISK 
zmA QKALGFADVS LLNPILVQCK TSGKPFYAIV HRATGCLWD FEPVKPTEFP A-TAAGALQS YKLAAKAISK 
atB ERAFVAREIT LLNPVWIHSK NTGKPFYAIL HRIDVGWID LEPAR-TEDP ALSIAGAVQS QKLAVRAISQ 
atD ERAFVAREIT LLNPIWIHSN NTGKPFYAIL HRVDVGILID LEPAR-TEDP ALSIAGAVQS QKLAVRAISH 
atE SKAASFTEIS LLNPVLVHSR TTQKPFYAIL HRIDAGIVMD LEPAK-SGDP ALTLAGAVQS QKLAVRAISR 
stB ERAFGAREIT LLNPIWIHSK NSGKPFYAIL HRVDVGIVID LEPAR-TEDP ALSIAGAVQS QKLRSEGLFL 
OSB ERAFAAREIS LLNPLWIHSR VSSNPFYAIL HRIDVGWID LEPAR-TEDP ALSI AGAVQS QKLWRAISR 
atC EKAVDFGEIS ILNPITLHCR SSSKPFYAIL HRIEEGLVID LEPVSPDEVP V-TAAGALRS YKLAAKSISR 


ANG —A--LNP--PFYAI- HR-D -EP--AGA—S -KL- 

CON-—NP--P-YAI- HR-D -E--GA—S -KL- 


270 

★ 


280 

★ 


290 

★ 


300 

★ 


310 

★ 


320 

★ 


330 

* 


sm LQSLP-GGDI GLLCDTWEE VRDVTGYDLV MAYKFHEDEH 
cp LQALP-GGDI ELLCDTIVEE VRELTGYDRV MAFKFHEDEH 
ac LQSLP-GGDI GLLCDSWEE VHELTGFDRV MAYKFHEDEH 
atA LQSLP-SGSM ERLCDTMVQE VFELTGYDRV MAYKFHEDDH 
CpA LQSLP-SGSM ARLCDTMVQE VFELTGYDRV MAYKFHDDDH 
psA LQSLA-SGSM ERLCDTMVQE VFELTGYDRV MAYKFHEDDH 
StA LQSLP-SGSM ERLCDTMVQE VFELTGYDRV MGYKFHDDDH 
asA IQSLP-GGSM EVLCNTWKE VFDLTGYDRV MAYKFHEDDH 
OSA IQSLP-GGSM EVLCNTWKE LFDLTGYDRV MAYKFHEDDH 
zmA IQSLP-GGSM EALCNTWKE VFDLTGYDRV MAYKFHEDEH 
atB LQALP-GGDI KLLCDTWES VRDLTGYDRV MVYKFHEDEH 
atD LQSLP-SGDI KLLCDTWES VRDLTGYDRV MVYKFHEDEH 
atE LQSLP-GGDI GALCDTWED VQRLTGYDRV MVYQFHEDDH 
StB ICNHFLVGTL KLLCDTWES VRELTGYDRV MVYKFHEDEH 
OSB LQALP-GGDV KLLCDTWEH VRELTGYDRV MVYRFHEDEH 
atC LQALP-SGNM LLLCDALVKE VSELTGYDRV MVYKFHEDGH 

ANG-G— —LC-V—-LTGYDRV M-Y-FH-D-H 

CON-G— —LC-V—-TG-D-V M-FH-D-H 


340 

* 


350 

* 


360 

* 


370 

* 


sm FLFMKNRVRM ICDCSAPPVK 
cp FLLMKNRVRL IADCYASPVK 
ac FLFMKNRVRM ICDCRLPPVK 
atA FLFMKNRVRM IVDCNAKHAR 
cpA FLFMKNRVRM IVDCRAKHLK 
psA FLFMKNRVRM IVDCNAKHVK 
StA FLFMKNRVRM ICDCRAKHVK 
asA LLFMKNKVRM ICDCRARSIK 
OSA FLFMKNRVRM ICDCRARSIK 
zmA FLFMKNRVRM ICDCRARSVK 
atB FLFKQNRVRM IVDCNATPVL 
atD FLFKQNRVRM IVDCYASPVR 
atE FLFKQNRVRM ICDCNATPVK 
stB FLFKQNRVRM IVDCHATPVR 
OSB FLFRQNRVRM IADCHAAPVR 
atC FLFMRNKVRM ICDCSAVPVK 

ANG -LF—N-VRM I-DC-A- 

CON -L-N-VR- I-DC- 


I a t m • . 

IJ 'if. J 




, tMnmnt 


9* I • 1 

SS: 

j&r 

9 r 




mm 


Jrjlj; 




ITQDKELRQP 
LIQDPDIRQP 
LIQDKTLSQP 
VLQDEKLSFD 
VLQDEKLQFD 
VLQDEKLPFD 
WQDEKLPFD 
VIEAEALPFD 
IIEDESLHLD 
IIEDEALSID 
WQDDRLTQS 
WQDDRLTQF 
VVQSEELKRP 
VTQDESLMQP 
VIQDPALTQP 
WQDKSLSQP 


ISLAGSTLRA 
VSLAGSTLRA 
MSLTGSTLRA 
LTLCGSTLRA 
LTLCGSTLRA 
LTLCGSTLRA 
LTLCGSTLRA 
ISLCGSALRA 
ISLCGSTLRA 
ISLCGSTLRA 
MCLVGSTLRA 
ICLVGSTLRA 
LCLVNSTLRA 
LCLVGSTLRA 
LCLVGSTLRS 
ISLSGSTLRA 
—L—S-LR- 
—L—S-LR- 


GEWAEIRRS 

GEWAEIRRM 

GEVVAEIRRT 

GEWSEVTKP 

GEVISEVAKP 

GEVIAEIAKP 

GEWSEITKP 

GEVFSEITKP 

GEVFAEITKP 

GEVFAEITKP 

GEWAESKRD 

GEWAESKRN 

GEWSEIRRS 

GEWAESKRS 

GEWAESRRS 

GEVIAECCRE 

GEV—E- 

GEV—E- 


DLEPYLGLHY 

DLEPYMGLHY 

DLEPYIGLHY 

GLEPYLGLHY 

GLQPYLGLHY 

GLEPYLGLHY 

GLEPYLGLHY 

GLEPYLGLHY 

GLEPYLGLHY 

GIEPYIGLHY 

DLEPYIGLHY 

DLEPYIGLHY 

DLEPYLGLHY 

DLEPYIGLHY 

NLEPYIGLHY 

DMEPYLGLHY 

-PY-GLHY 

-PY-GLHY 


PATDIPQASR 
PATDIPQASR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQAAR 
PATDIPQASR 
PATDIPQASR 
PATDIPQAAR 
PATDIPQASR 
PATDIPQASR 
SATDIPQASR 
-ATDIPQA-R 
-ATDIPQA-R 


>TARGET 

380 390 400 


PHGCHAQYMG 
PHGCHAQYMG 
PHGCHTQYMA 
PHSCHLQYMA 
PHSCHLQYME 
PHSCHLQYMA 
PHYCHLQYME 
PHSCHLQYME 
PHSCHLQYME 
PHSCHLKYME 
PHGCHSQYMA 
PHGCHAQYMT 
PHGCHTQYMA 
PHGCHAQYMA 
PHGCHGQYMA 
PHGCHAQYMS 
PH-CH—YM- 
PH-CH—YM- 


NMGSVASLVM 
NMGSIASLVM 
NMNSISSLVM 
NMDSIASLVM 
NMNSIASLVM 
NMDSIASLVM 
NMNSIASLVM 
NMNSIASLVM 
NMNSIASLVM 
NMNSIASLVM 
NMGSIASLAM 
NMGSIASLAM 
NMGSVASLAL 
NMGSIASLTL 
NMGSIASLVM 
NMGSVASLVM 
NM-S-ASL— 
NM-S—SL— 


AMIINDNDE- 
AVIINDNEE- 
AVIVNDSDDD 
AVWNEEDGE 
AVWNEGDEE 
AVWNDSDED 
AVWNDGDEE 
AVWNENEED 
AVWNENEDD 
AVWNENEED 
AVIINGNEDD 
AVIINGNEED 
AIWKGKD— 
AVIINGNDEE 
AVIISSGGDD 
SVTINGSDSD 
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410 420 430 440 

★ ★ ★ ★ 

sm -PSGGGGGGG QHKGRRLWGL WCHHTSPRS VPF-LRSACE 


cp -YSRGA IQRGRKLWGL WCQHTSPRT VPFPLRSVCE 

ac -SAGH SSQGIKLWGL WCHHTSPRY VPFPVRSACE 

atA GD-APDATTQ PQKRKRLWGL WCHNTTPRF VPFPLRYACE 

cpA NE-GPALQ QQKRKRLWGL WCHNSSPRF VPFPLRYACE 

PSA GD--SADAVL PQKKKRLWGL WCHNTTPRF VPFPLRYACE 
StA GE--SSDSSQ SQKRKRLWGL WSHNTTPRF APFPLRYACE 


asA DEAESEQPAQ QQKKKKLWGL LVCHHESPRY VPFPLRYACE 
OSA DEVGADQPAQ QQKRKKLWGL LVCHHESPRY VPFPLRYACE 
zmA DEPEPEQPPQ QQKKKRLWGL IVCHHESPRY VPFPLRYACE 


atB G-SNVAS GRSSMRLWGL WCHHTSSRC IPFPLRYACE 

atD G-NGVNTG GRNSMRLWGL WCHHTSARC IPFPLRYACE 

atE--SSKLWGL WGHHCSPRY VPFPLRYACE 

stB-AVGG GRNSMRLWGL WGHHTSVRS IPFPLRYACE 

OSB D--HNIARGS IPSAMKLWGL WCHHTSPRC IPFPLRYACE 

atC E-MNRD LQTGRHLWGL WCHHASPRF VPFPLRYACE 

ANG--LWGL -V-H-R- -PFPLRYACE 

CON--LWGL -V-R- -PF--R--CE 


480 490 500 510 

it it if it 

sm LLCDMLLRDA -PIGIVSQSP NIMDLVKCDG AALYYGKRFW 
cp LLCDMLMRDA -PLGIVSQTP NIMDLVKCDG AALYYGKRVW 
ac LLCDMLLRDA -PIGIVSQSP NIMDLVTCDG AALYYGKKCW 
atA LLCDMLMRDA -PLGIVSQSP NIMDLVKCDG AALLYKDKIW 
CpA LLC DMLMRDA -PLGIVSRSP NIMDLVKSDG AALLYKKKIW 
psA LLC DMLMRDA -PLGIVSQSP NIMDLVKCDG AALFYRNKLW 
StA LLC DMLMRDA -PLGIVSQSP NIMDLIKCDG AALLYKNKIH 
asA MLSDMLFREA SPLTIVSGTP NIMDLVKCDG AALLYGGKVW 
OSA MLSDMLFRES SPLSIVSGTP NIMDLVKCDG AALLYGGKVW 
zmA MLSDMLFKES SPLSIVSGSP NIMDLVKCDG AALLYGDKVW 
atB LLCDMLLRDS -PAGIVTQSP SIMDLVKCDG AAFLYHGKYY 
atD LLCDMLLRDS -PAGIVTQRP SIMDLVKCNG AAFLYQGKYY 
atE LLCDMLLRDT -VSAIVTQSP GIMDLVKCDG AALYYKGKCW 
StB LLCDMLLRDS -PPGIVTQSP SIMDLVKCDG ALLYYQGKYY 
OSB LLCDMLLRDS -PTGIVTQSP SIMDLVKCDG AALYYHGKYY 
atC VLCDMLFRNA -PIGIVTQSP NIMDLVKCDG AALYYRDNLW 

ANG -L-DML--IV-P -IMDL-K--G A Y- 

CON -L-DML--IV-P -IMDL-G A Y- 


450 

★ 


460 

★ 


470 

* 


FLMQVFGLQL 
FLMQVFGMQL 
FLMQVFSLQL 
FLAQVFAIHV 
FLAQVFAIHV 
FLAQVFAIHV 
FLAQVFAILV 
FLAQVFAVHV 
FLAQVFAVHV 
FLAQVFAVHV 
FLMQAFGLQL 
FFMQAFGLQL 
FLMQAFGLQL 
FLMQAFGLQL 
FLMQAFGLQL 
FLTQVFGVQI 

F--Q-F- 

F--Q-F- 


NMEAAVAAHV 
NLHVELAAQL 
NMEVGMAAQV 
NKEVELDNQM 
NKELELENQI 
NKEIELEYQI 
NKELELENQF 
NREFELEKQL 
NKEFELERQV 
NKEFELEKQI 
NMELQLALQM 
NMELQLASQV 
QMELQLASQL 
NMELQLASQL 
NMELQLAHQL 
NKEAESAVLL 
— E- 


REKHILRTQT 
REKHILRTQT 
REKHILRTQT 
VEKNILRTQT 
IEKNILRTQT 
LEKNILRTQT 
LEKNILRTQT 
REKNILKMQT 
REKSILRMQT 
REKNILRMQT 
S EKRVLRTQT 
S EKRVLRMQT 
AEKKAMRTQT 
SEKHVLRTQT 
SEKHILRTGT 
KEKRILQTQS 

-EK- 

-EK- 


520 

★ 

LLGITPSEAQ 

LLGTTPTENQ 

LLGTTPTEAQ 

KLGTTPSEFH 

RLGLTPNDFQ 

LLGATPTESQ 

RLGMNPSDFQ 

RLQNAPTESQ 

RLQNAPTESQ 

RLQTAPTESQ 

PLGVAPSEVQ 

PLGVTPTDSQ 

LVGVTPNESQ 

PLGVTPTEAQ 

PLGVTPTEVQ 

SLGVTPTETQ 


530 

★ 

IKDIAEWLLE 

IKEIADWLLE 

IVDIAAWLLD 

LQEIASWLCE 

LLDIASWLSE 

LREIALWMSE 

LHDIVSWLCE 

IHDIAFWLSD 

IRDIAFWLSD 

IRDIAFWLSE 

IKDWEWLLA 

INDIVEWLVA 

VKDLVNWLVE 

IKDIVEWLLA 

IKDIIEWLTM 

IRDLIDWVLK 

-W- 

-W- 


540 

★ 

HH-KDSTGLS 

HH-MDSTGLS 

CH-KDSTGLS 

YH-MDSTGLS 

YH-MDSTGLS 

YH-TDSTGLS 

YH-TDSTGLS 

VH-RDSTGLS 

VH-RDSTGLS 

VH-GDSTGLS 

NH-ADSTGLS 

NH-SDSTGLS 

NHGDDSTGLT 

YH-GDSTGLS 

CH-GDSTGLS 

SH-GGNTGFT 

-H-TG— 

-H-TG-- 


550 560 570 580 590 600 610 

★ ★ ★ dr ★ ★ ★ 

sm TDSLADAGYP GAASLGDEVC GMAAAKITAK DFLFWFRSHT AKEVKWGGAK HDPDDKDDGR KMHPRSSSKA 

cp TDSLADANYP GAHLLGDAVC GMAAAKITAK DFLFWFRSHT ATEVKWGGAK HDPDEKDDGR KMHPRSSFKA 

ac TDSLAKTGYP EASCLGDAVC GLAAAKITAT DFLFWFRSHT AKEVRWGGAR HDPEERDDGR RMHPRSSFKA 

atA TDSLHDAGFP RALSLGDSVC GMAAVRISSK DMIFWFRSHT AGEVRWGGAK HDPDDRDDAR RMHPRSSFKA 

cpA TDSLYDAGYP GAIALGDEVC GMAAVRITNN DMIFWFRSHT ASEIRWGGAK HEHGQKDDAR KMHPRSSFKA 

psA TDSLSDAGFP GALSLSDTVC GMAAVRITSK DIVFWFRSHT AAEIRWGGAK HEPGDQDDGR KMHPRSSFKA 

StA TDSLYDAGFP GALALGDAVC GMAAVRISDK DWLFWYRSHT AAEVRWGGAK HEPGEKDDGR KMHPRSSFKG 

asA TDSLHDAGYP GAAALGDMIC GMAVAKINSK DILFWFRSHT AAEIRWGGAK NDPSDMDDSR RMHPRLSFKA 

OSA TDSLHDAGYP GAAALGDMIC GMAVAKINSK DILFWFRSHT AAEIRWGGAK HDPSDKDDSR RMHPRLSFKA 

zmA TDSLQDAGYP GAASLGDMIC GMAVAKITSK DILFWFRSHT AAEIRWGGAK HDPSDKDDNR RMHPRLSFKA 

atB TDSLGDAGYP GAAALGDAVC GMAVAYITKR DFLFWFRSHT AKEIKWGGAK HHPEDKDDGQ RMHPRSSFQA 

atD TDSLGDAGYP RAAALGDAVC GMAVACITKR DFLFWFRSHT EKEIKWGGAK HHPEDKDDGQ RMNPRSSFQT 

atE TDSLVDAGYP GAISLGDAVC GVAAAEFSSK DYLLWFRSNT ASAIKWGGAK HHPKDKDDAG RMHPRSSFTA 

StB TDSLPDAGYP GAASLGDAVC GMAVAYITSK DFLFWFRSHT AKEIKWGGAK HHPEDKDDGQ RMHPRSSFKA 

OSB TDSLADAGYS GAAALGDAVS GMAVAYITPS DYLFWFRSHT AKEIKWGGAK HHPEDKDDGQ RMHPRSSFKA 

atC TESLMESGYP DASVLGESIC GMAAVYISEK DFLFWFRSST AKQIKWGGAR HDPNDR-DGK RMHPRSSFKA 

ANG T-SL-G-- -A--L-G-A-D-W-RS-T-WGGA--D-- -M-PR-SF-- 

CON T-SL--A--L-G-A-D-W-RS-T-WGGA--D-- -M-PR-S- 
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620 

* 


630 

★ 


640 

* 


650 

★ 


660 

★ 


670 

★ 


680 

★ 


sm FLEWKRRSL PWEDVEMDAI 
cp FLEWNKRSP PWEDVEMDAI 
ac FLEWKQQSL PWEDVEMDAI 
atA FLEWKTRSL PWKDYEMDAI 
cpA FLEWKTRSL PWKDYEMDAI 
psA FLEWKARSV PWKDFEMDAI 
St A FLEWKTRSI PWKDYEMDRI 
asA FLEWKMKSL PWSDYEMDAI 
osA FLEWKMKSL PWNDYEMDAI 
zmA FLEWKTKSL PWSDYEMDAI 
atB FLEWKSRSQ PWETAEMDAI 
atD FLEWKSRCQ PWETAEMDAI 
atE FLEVAKSRSL PWEISEIDAI 
stB FLEWKSRSS PWENAEMDAI 
OSB FLEWKSRSL PWENAEMDAI 
atC FMEIVRWKSV PWDDMEMDAI 

ANG F-E-PW-E-D-I 

CON F-E-PW-E-D-I 


HSLQLILRGS 

HSLQLILRGS 

HSLQLILRGS 

HSLQLILRNA 

HSLQLILRNT 

HSLQLILRNA 

HSLQLILRNA 

HSLQLILRGT 

HSLQLILRGT 

HSLQLILRGT 

HSLQLILRDS 

HSLQLILRDS 

HSLRLIMRES 

HSLQLILRDS 

HSLQLILRDS 

NSLQLIIKGS 

-SL-LI- 

-SL-LI- 


FQDIDDSDTK 
FRDIADSDTK 
FQDIDDSNTK 
FKDSETTDVN 
FKDTDATEIN 
SKDTDIIDLN 
FKDADAVNSN 
LNDASKPKRE 
LNDDIKPTRA 
LNDASKPAQA 
FKESEAAMNS 
FKESEAMDSK 
FTSSRPVLSG 
FKDAEASNSK 
FRDSAEGTSN 
LQEEH-SK 


TM-IHAR- 

TM-IHAR- 

TM-IHAR- 

TKVIYSK- 

RKSIQTT- 

TKAINTR- 

TISIHTK- 

ASLDNQI- 

ASLDNQV- 

SGLDNQI- 

KWDGWQPC 
AAAAGAVQPH 
NGVARDAN— 

AIVHAH- 

SKAIVNGQVQ 
TWDVP- 


LNDLKLQGMD 

LNDLKLQGVE 

LNDLKLQGLD 

LNDLKIDGIQ 

LGDLKIEGRQ 

LNDLKIEGMQ 

LNDLKIDGMQ 

-GDLKLDGLA 

-GDLKLDGLA 

-GDLKLDGLA 

RDMAGEQGID 

GDDMVQQGMQ 


LGEMELQGID 

LGELELRGID 

LVDNRVQKVD 


ELSTVANEMV 

ERNALANEMS 

ELSTVASEMV 

ELEAVTSEMV 

ELESVTSEMV 

ELEAVTSEMV 

ELEAVTAEMV 

ELQAVTSEMV 

ELQAVTSEMV 

ELQAVTSEMV 

ELGAVAREMV 

EIGAVAREMV 

ELTSFVCEMV 

ELSSVAREMV 

ELSSVAREMV 

ELCVIVNEMV 

E-EMV 

E-EM- 


690 

★ 


700 


710 720 



730 

* 


740 

★ 


I NT RON 


750 

★ 


sm RLIETATAPI 
cp RVLETAAAPI 
ac RLIETATAPI 
atA RLIETATVPI 
cpA RLIETATVPI 
psA RLIETATVPI 
stA RLIETASVPI 
asA RLMETATVPI 
osA RLMETATVPI 
zmA RLMETATVPI 
atB RLIETATVPI 
atD RLIETATVPI 
atE RVIETATAPI 
StB RLIETATAPI 
osB RLIETATVPI 
atC RLIDTAAVPI 

ANG R-TA—PI 

CON R-TA—PI 


LAVDSSGFIN 
LAVDSRGMIN 
LAVDGQGLIN 
LAVDSDGLVN 
LAVDLDGLIN 
LAVDVDGTVN 
FAVDVDGQVN 
LAVDGNGLVN 
LAVDSNGLVN 
LAVDGNGLVN 
FAVDAGGCIN 
FAVDIDGCIN 
FGVDSSGCIN 
FAVDVEGRIN 
FAVDTDGCIN 
FAVDASGVIN 
—VD—G—N 
—VD—G—N 


GWNAKVADVT 
AWNAKIAQVT 
GWNGKVAELT 
GWNTKIAELT 
GWNTKIAELT 
GWNIKIAELT 
GWNTKVAELT 
GWNQKAAELT 
GWNQKVAELT 
GWNQKVAELS 
GWNAKIAELT 
GWNAKIAELT 
GWNKKTAEMT 
GWNAKVAELT 
GWNAKVAELT 
GWNSKAAEVT 
GWN-K-AE— 
-WN-K-A- 


GLPVTEAMGR 

GLPVEEAMHC 

GLSFETAMGK 

GLSVDEAIGK 

GLPVDKAIGK 

GLPVGEAIGK 

GLPVDEAIGK 

GLRVDDAIGR 

GLRVDEAIGR 

GLRVDEAIGR 

GLSVEEAMGK 

GLSVEDAMGK 

GLLASEAMGK 

GVSVEEAMGK 

GLSVEEAMGK 

GLAVEQAIGK 

G-A-G- 

G-A- 


SLAKELVLHE 
SLTKDLVLDE 
SLAKELVHEE 
HFLT-LVEDS 
HLLT-LVEDS 
HLLT-LVEDS 
HLLT-LVEDS 
HILT-LVEDS 
HILT-WEES 
HILT-LVEDS 
SLVSDLIYKE 
SLVRELIYKE 
SLADEIVQEE 
SLVHDLVYKE 
SLVNDLIFKE 
P-VSDLVEDD 


SADMVERLLY 

SWWERLLS 

SKTIVERVLH 

SVEIVKRMLE 

SVEWRKMLF 

STDIVKKMLN 

SVDTVNKMLE 

SVPWQRMLY 

SVPWQRMLY 

SVSLVQRMLY 

NEATVNKLLS 

YKETVDRLLS 

SRAALESLLC 

SQETAEKLLY 

SEETVNKLLS 

SVETVKNMLA 


LALQGDEEQN 

LALQGEEEQN 

LALEGEEEQD 

NALEGTEEQN 

LALQGQEEQN 

LALQGEEEKN 

LALQGQEERN 

LALQGKEEKE 

LALQGKEEKE 

LALQGREEKE 

RALRGDEEKN 

CALKGDEGKN 

KALQGEEEKS 

NALRGEEDKN 

RALRGDEDKN 

LALEGSEERG 

-AL-G-E- 

-AL-G-E- 


760 

* 


770 

★ 


780 

★ 


790 

★ 


800 

* 


810 

* 


820 

★ 


sm VELKLKTFGG QKDKEAVIL- 
cp VEIKLKTFGT QTTERAVIL- 
ac IEIHLRTYDQ HKQKGWIL- 
atA VQFEIKTHLS RADAGPISL- 
cpA VQFEIKTHGS HIEVGSISL- 
psA VQFEIKTHGD QVESGPISL- 
stA VEFEIKTHGP SRDSSPISL- 
asA VRFEVKTHGP KRDDGPVIL- 
osA VKFEVKTHGS KRDDGPVIL- 
zmA VRFELKTHGS KRDDGPVIL- 
atB VEVKLKTFSP ELQGKAVFV- 
atD VEVKLKTFGS ELQGKAMFV- 
atE VMLKLRKFGQ NNHPDYSSDV 
StB VEIKLRTFGA EQLEKAVFV- 
OSB VEIKLKTFGP EQSKGPIFV- 
atC AEIRIRAFGP KRKSSPVEL- 

ANG-- 

CON-- 


—WNACASR 
—IVNACCSR 
—IVNTCCSR 
—WNACASR 
—WNACASR 
—IVNACASK 
—IVNACASK 
—WNACASR 
—WNACASR 
—WNACASR 
—VVNACSSK 
—VVNACSSK 
CVLVNSCTSR 
—WNACA-R 
—IVNACSTR 
—VVNTCCSR 

-VN-C- 

-VN-C- 


DVSDNVVGVC 
DASDFWGVF 
DVSNNWGVC 
DLHENWGVC 
DLRENWGVF 
DLRENVVGVC 
DVRDSWGVC 
DLHDHWGVC 
DLHDHWGVC 
DLHDHWGVC 
DYLNNIVGVC 
DYLNNIVGVC 
DYTENIIGVC 
DYTNNIVGVC 
DYTKNIVGVC 
DMTNNVLGVC 

D-GV- 

D-GV- 


FVGQDVTGQK 

FVGQDVTEQR 

FVGQDVTGQK 

FVAHDLTGQK 

FVAQDITGQK 

FVAQDITAQK 

FIAQDITGQK 

FVAQDMTVHK 

FVAQDMTVHK 

FVAQDMTVHK 

FVGQDVTSQK 

FVGQDVTGHK 

FVGQDITSEK 

FVGQDVTGEK 

FVGQDVTGQK 

FIGQDVTGQK 

F D-T—K 

F-D-T- 


WMDKFTRIQ 

MFMDRFTRIQ 

LVLDRFIRIQ 

TVMDKFTRIE 

MVMDKFTRLE 

TVMDKFTRIE 

SIMDKFTRIE 

LVMDKFTRVE 

LVMDKFTRVE 

LVMDKFTRVE 

IVMDKFINIQ 

IVMDKFINIQ 

AITDRFIRLQ 

WMDKFINIQ 

WMDKFINIQ 

TLTENYSRVK 


GDYKAIVQNP 
GGEKTTVQDP 
GDYKAIVQSL 
GDYKAIIQNP 
GDYKAIVQNP 
GDYKAIVQNP 
GDYRAIIQNP 
GDYKAIIHNP 
GDYKAIIHNP 
GDYKAIIHNP 
GDYKAIVHSP 
GDYKAIIHSP 
GDYKTIVQSL 
GDYKAIVHSP 
GDYKAIVHNP 
GDYARIMWSP 

GDY — I- 

G- 
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830 

* 


840 

* 


850 

* 


860 

* 


870 

* 


880 

* 


890 

* 


sm NPLIPPIFGA DEFGYCSEWN PAMEKLSGWR REEVLGKMLV GEIFGIQMMY CRLKGQDAVT KFMIVLNSAA 

cp HPLMRPSFDG DEFGRTFKRN SALGGL----- 

ac NPLIPPIFGA DEYGFCSEWN AAMEKLSNWR REEVLGKMLV GEIFGLQMVC CRLQGQDWT KLMIVLNDAV 

atA NPLIPPIFGT DEFGWCTEWN PAMSKLTGLK REEVIDKMLL GEVFGTQKSC CRLKNQEAFV NLGIVLNNAV 

cpA NPLIPPIFGS DEFGWCSEWN PAMAKLTGWS REEVIDKMLL GEVFGVHKSC CRLKNQEAFV NLGIVLNNAM 

psA NQLIPPIFGT DEFGWCCEWN AAMIKLTGWK REEVMDKMLL GEVFGTQMSC CRLKNQEAFV NFGIVLNKAM 

StA HPLIPPIFGT DQFGWCSEWN SAMTMLTGWR RDDVMDKMLL GEVFGTQAAC CRLKNQEAFV NFGVILNNAI 

asA NPLIPPIFGA DEFGWCSEWN AAMTKLTGWN RDEVLDKMLL GEVFDSSNAS CPLKNRDAFV SLCVLINSAL 

OSA SPLIPPIFGA DEFGWCSEWN AAMTKLTGWH RDEVINKMLL GEVFDSTNAS CLVKNKDAFV SLCILINSAL 

zmA NPLIPPIFGA DQFGWCSEWN AAMTKLTGWH RDEWDKMLL GEVFNSSNAS CLLKSKDAFV RLCIVINSAL 

atB NPLIPPIFAA DENTCCLEWN MAMEKLTGWS RSEVIGKMIV GEVFG-SC CMLKGPDALT KFMIVLHNAI 

atD NPLIPPIFAA DENTCCLEWN TAMEKLTGWP RSEVIGKLLV REVFG-SY CRLKGPDALT KFMIVLHNAI 

atE NPLIPPIFAS DENACCSEWN AAMEKLTGWS KHEVIGKMLP GEVFG-VF CKVKCQDSLT KFLISLYQGI 

stB NPLIPPIFAS DENTCCSEWN TAMEKLTGWS RGEIVGKMLV GEIFG-SC CRLKGPDAMT KFMIVLHNAI 

OSB NPLIPPIFAS DENTCCSEWN TAMEKLTGWS RGEWGKLLV GEVFG-NC CRLKGPDALT KFMIVLHNAI 

atC STLIPPIFIT NENGVCSEWN NAMQKLSGIK REEWNKILL GEVFTTDDYG CCLKDHDTLT KLRIGFNAVI 


ANG —LIPPIF—-C-EWN -AM-KL-G—-K--E-F-C—K-- 

CON —L—P-F—-N -A L--K--E-F-C-- 

900 910 920 930 940 950 960 

* * ★ * * * it 


sm DGQ-DTEKFP FAFFDRQGKY VEALLTATKR ADAEGSITGV FCFPHIASAE LQQALTVQRA TEKVALSKLK 

cp - 

ac NGQ-ESEKFP LVFYDRNGRR VEALLIASKR TDADGRITGV FCFLHTASPE LLQALIIKRA KEKV DK 

atA TSQ-DPEKVS FAFFTRGGKY VECLLCVSKK LDREGVVTGV FCFLQLASHE LQQALHVQRL AERTAVKRLK 

cpA CGQ-DPEKAS FGFLARNGMY VECLLCVNKI LDKDGAVTGF FCFLQLPSHE LQQALNIQRL CEQTALKRLR 

psA TGL-ETEKVP FGFFSRKGKY VECLLSVSKK IDAEGLVTGV FCFLQLASPE LQQALHIQRL SEQTALKRLK 

StA TGQ-ESEKIP FGFFARYGKY VECLLCVSKR LDKEGAVTGL FCFLQLASHE LQQALHVQRL SEQTALKRLK 

asA AGE-ETEKAP FGFFDRSGKY IECLLSANRK ENEGGLITGV FCFIHVASHE LQHALQVQQA SEQTSLKRLK 

OSA AGD-ETEKAP FSFFDRNGKY IECLLSVNRK VNADGVITGV FCFIQVPSHE LQHALHVQQA SQQNALTKLK 

zmA AGE-EAEKAS FGFFDRNEKY VECLLSVNRK VNADGWTGV FCFIHVPSDD LQHALHVQQA SEQTAQRKLK 

atB GGQ-DTDKFP FPFFDRNGKF VQALLTANKR VSLEGKVIGA FCFLQIPSPE LQQALAVQRR QDTECFTKAK 

atD GGQ-DTDKFP FPFFDRKGEF IQALLTLNKR VSIDGKIIGA FCFLQIPSPE LQQALEVQRR QESEYFSRRK 

atE AGDNVPESSL VEFFNKEGKY IEASLTANKS TNIEGKVIRC FFFLQIINKE SGLSCPELKE SAQS LN 

StB GGQ-DTDKFP FSFFDRNGKY VQALLTRNKR VNMEGDTIGA FCFIQIASPE LQQALRVQRQ QEKKCYSQMK 

osB GGQ-DCEKFP FSFFDKNGKY VQALLTANTR SRMDGEAIGA FCFLQIASPE LQQAFEIQRH HEKKCYARMK 

atC SGQKNIEKLL FGFYHRDGSF IEALLSANKR TDIEGKVTGV LCFLQVPSPE LQYALQVQQI SEHAIACALN 

ANG-—F--L--G-—F--- 

CON-—F--L--G-—F--- 


970 980 990 1000 1010 1020 INm>N 1030 

******* 

sm ELAYIRQEIK NPLYGIMFTR TLMETTDLSK DQKQYFETGA VCEKQIRKIL DDMDLESIED G—YLELDTT 

cp - 

ac ELSYVKEELK KPLEGLAFTR TVLEGTNLTI EQRQLIKTNA WCERQLRKIL -EDDLNNIEE G—YMDLEMS 

atA ALAYIKRQIR NPLSGIMFTR KMIEGTELGP EQRRILQTSA LCQKQLSKIL DDSDLESIIE G—CLDLEMK 

cpA ALGYIKRQIQ NPLSGIIFSR RLLERTELGV EQKELLRTSG LCQKQISKVL DESDIDKIID G—FIDLEMD 

psA VLTYMKRQIR NPLAGIVFSS KMLEGTDLET EQKRIVNTSS QCQRQLSKIL DDSDLDGIID G—YLDLEMA 

StA VLAYIRRQIR NPLSGIIFSR KMLEGTSLGE EQKNILHTSA QCQRQLDKIL DDTDLDSIIE G—YLDLEML 

asA AFSYMRHAIN NPLSGMLYSR KALKNTDLNE EQMKQIHVGD NCHHQINKIL ADLDQDSITE KSSCLDLEMA 

OSA AYSYMRHAIN NPLSGMLYSR KALKNTGLNE EQMKEVNVAD SCHRQLNKIL SDLDQDSVMN KSSCLDLEMV 

zmA AFSYMRHAIN KPLSGMLYSR ETLKSTGLNE EQMRQVRVGD NCHRQLNKIL ADLDQDNITD KSSCLDLDMA 

atB ELAYICQVIK NPLSGMRFAN SLLEATDLNE DQKQLLETSV SCEKQISRIV GDMDLESIED G—SFVLKRE 

atD ELAYIFQVIK NPLSGLRFTN SLLEDMDLNE DQKQLLETSV SCEKQISKIV GDMDVKSIDD G—SFLLERT 

atE ELTYVRQEIK NPLNGIRFAH KLLESSEISA SQRQFLETSD ACEKQITTII ESTDLKSIEE G—KLQLETE 

stB ELAYICQEIK SPLNGIRFTN SLLEATNLTE NQKQYLETSA ACERQMSKII RDIDLENIED G—SLTLEKE 

OSB ELAYIYQEIK NPLNGIRFTN SLLEMTDLKD DQRQFLETST ACEKQMSKIV KDASLQSIED G—SLVLEKG 

atC KLAYLRHEVK DPEKAISFLQ DLLHSSGLSE DQKRLLRTSV LCREQLAKVI SDSDIEGIEE G—YVELDCS 

ANG-Y--P---Q--C—Q---L- 

CON-Y--P---Q--C—Q---L- 


ArPENDix 1. Continued. 
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1040 1050 1060 1070 1080 1090 1100 

sm EFMMGTVMDA VISQGMITSK EKNLQLIRET PKEIKAMFLY GDQVRLQQVL ADFLLNAIRF TPSSEN- 

cp ----- 

ac EFFMGSVIDA VISQGMAASR GKGVQILTEI PNDVKLMCLF GDQARLQQVL ADLLFCAINH ATTTNEDEKD 

atA EFTLNEVLTA STSQVMMKSN GKSVRITNET GEEVMSDTLY GDSIRLQQVL ADFMLMAVNF TPSGG- 

cpA EFTLHEVLMV SISQVMLKIK GKGIQIVNET PEEAMSETLY GDSLRLQQVL ADFLLISVSY APSGG- 

psA EFTLHEVLVT SLSQVMNRSN TKGIRIANDV AEHIARETLY GDSLRLQQVL ADFLLISINS TPNGG- 

stA EFKLHEVLVA SISQVMMKSN GKNIMISNDM VEDLLNETLY GDSPRLQQVL ANFLLVSVNS TPSGG- 

asA EFLLQDVWA AVSQVLITCQ GKGIRISCNL PERFMKQSVY GDGVRLQQIL SDFLFISVKF SPVGG- 

OSA EFVLQDVFVA AVSQVLITCQ GKGIRVSCNL PERYMKQTVY GDGVRLQQIL SDFLFVSVKF SPVGG- 

zmA EFVLQDVWS AVSQVLIGCQ AKGIRVACNL PERSMKQKVY GDGIRLQQIV SDFLFVSVKF SPAGG- 

atB EFFLGSVINA IVSQAMFLLR DRGLQLIRDI PEEIKSIEVF GDQIRIQQLL AEFLLSIIRY APSQE- 

atD EFFIGNVTNA WSQVMLWR ERNLQLIRNI PTEVKSMAVY GDQIRLQQVL AEFLLSIVRY APMEG- 

atE EFRLENILDT IISQVMIILR ERNSQLRVEV AEEIKTLPLN GDRVKLQLIL ADLLRNIVNH APFPNS- 

stB DFFLGSVIDA WSQVMLLLR EKGVQLIRDI PEEIKTLTVH GDQVRIQQVL ADFLLNMVRY APSPDG- 

OSB EFSLGSVMNA WSQVMIQLR ERDLQLIRDI PDEIKEASAY GDQYRIQQVL CDFLLSMVRF APAENG- 

atC EFGLQESLEA WKQVMELSI ERKVQISCDY PQEVSSMRLY GDNLRLQQIL SETLLSSIRF TPALRGL- 

ANG -F--Q---GD-Q---P- 

CON -F--Q---GD-Q--- 


1110 

Hr 


1120 


INTR'IN 


1130 


1140 

* 


1150 

* 


1160 

* 


1170 

★ 


sm WVGIKVATSR KRLGGWHVM HLEFRITHPG VGLPEELVQE MFDRGRGM-T QEGLGLSMCR KLVKLMN-GE 


cp - 

ac WVTIKVSRTK 

atA QLTVSASLRK 

cpA QLTISTDVTK 

psA QWIAASLTK 

StA KLSISGKLTK 

asA SVEISSKLTK 

osA SVEISCSLTK 

zmA SVDISSKLTK 

atB WVEIHLSQLS 

atD SVELHLCPTL 

atE WVGISISPGQ 

StB WVEIQLRPSM 

osB WVEIQVRPNI 

atC CVSFKVIARI 

ANG- 

CON- 


TRLDDGVHLM 

DQLGRSVHLA 

NQLGKSVHLV 

EQLGKSVHLV 

DRIGESVQLA 

NSIGENLHLI 

NSIGENLHLI 

NSIGENLHLI 

KQMADGFAAI 

NQMADGFSAV 

ELSRDNGSRI 

MPISDGVTW 

KQNSDGTDTM 

EAIGKRMKRV 


HFESRISHSG 
NLEIRLTHTG 
HLEFRITYAG 
NLELSITHGG 
LLEFRIRHTG 
DLELRIKHQG 
DLELRIKHQG 
DFELRIKHRG 
RTEFRMACPG 
RLEFRMACAG 
HLQFRMIHPG 
HIELGLYAPG 
LFPFRFACPG 
ELEFRIIHPA 


QGISEALVEE 
AGIPEFLLNQ 
GGIPESLLNE 
SGVPEAALNQ 
GGVPEELLSQ 
LGVPAELMAQ 
KGVPADLLSQ 
AGVPAEILSQ 
EGLPPELVRD 
EGVPPEKVQD 
KGLPSEMLSD 
-RLPPELVQD 
EGLPPEIVQD 
PGLPEDLVRE 
-P- 


MTNKSQKW-T 

MFGTEE-DVS 

MFGSEE-DAS 

MFGNNV-LES 

MFGSEA-DAS 

MFEEDNKEQS 

MYEDDNKEQS 

MYEEDNKEQS 

MFHSSR-WTS 

MFHSSR-WTS 

MFETRDGWVT 

MFHSSR-WVT 

MFSNSR-WTT 

MFQPLRKGTS 

M- 

M- 


PEGLAISISC 

EEGLSLMVSR 

EEGFSLLISR 

EEGISLHISR 

EEGISLLVSR 

EEGLSLLVSR 

DEGMSLAVSR 

EEGFSLAVSR 

PEGLGLSVCR 

PEGLGLSVCR 

PDGLGLKLSR 

QEGLGLSMCR 

QEGIGLSICR 

REGLGLHITQ 

—G—L- 

—G- 


TLIRLMN-GD 

KLVKLMN-GD 

KLVKLMN-GD 

KLLKLMN-GD 

KLVKLMN-GE 

NLLRLMN-GD 

NLLRLMN-GD 

NLLRLMN-GD 

KILKLMN-GE 

KILKLMN-GG 

KLLEQMN-GR 

KMLKLMN-GE 

KILKLMG-GE 

KLVKLMERGT 

-M- 

-M- 


1180 1190 1200 1210 

★ ★ ★ ★ 

sm VEYIREAGKN YFLVSLELPL AQRDDAGSVK FQASS- 

cp - 


ac VKYTTDAGNK CFLVTIQFPL 
atA VQYLRQAGKS SFIITAELAA 
cpA VRYMREAGKS SFIITVELAA 
psA VRYLKEAGKS SFILSVELAA 
StA VQYLREAGRS TFIISVELAV 
asA VRHLREAGVS TFIITAELAS 
OSA VRHMREAGMS TFILSVELAS 
zmA IRHLREAGMS TFILTAELAA 
atB VQYIRESERS YFLIILELPV 
atD VQYIREFERS YFLIVIELPV 
atE VSYVREDERC FFQVDLQVKT 
StB IQYIRESERC YFLIILDLPM 
OSB VQYIRESERS FFHIVLELPQ 
atC LRYLRESEMS AFVILTEFPL 

ANG--F- 

CON--F- 


AHRDDATSVR - - 

ANK--- 

AHKSRTT--- 

AHKLKG--- 

ATKSS--- 

APTAMGQ--- 

APAK--- 

APSAVGR--- 

PRKRPLSTAS GSGDMMLMMP Y 

PLMMMMPSS- - - 

MLGVESRGTE GSSSIK- - 

TRKGPKSVG- - - 

PQQAASRGTS - - 

I--- 


Appendix 1. Continued. 
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Mougeotia' 

CeratodorT 

Selaginella 

Psilotum' 

Adiantum 

Anemia 

Dryopteris J 


300 310 320 330 340 350 360 

it it it it it it it 

DEHGEWAEIRRSDLEPYLGLHYPATDIPQASRFLFIKNRIRMICDCTSPQVKWQDSRIPQEMS 
DEHGEWAEIRRMDLE P YMGLH Y PATDIPQAS RFLLMKNRVRLI ADC Y AS PVKLIQD PD IRQP VS 
DEHGEWAEIRRSDLEPYLGLH Y PATDI PQAS RFLFMKNRVRMICDCSAPPVKITQDKELRQPIS 
DEHGEWAE IRRSDLEPFVGIHY PATD IPQACRFLFLKNRVTMICDCYAPPIRIIQDRQLKQPLS 
DEHGEWAE IRRTDLEPYIGLGY PATD IPQAARFLFMKNRVRMICDCRLPPVKLIQDKTLSQPMS 
DEHGEWAEIRRSDLEPYMGLHYPATDIPQAARFLFMKNRVRLIYDCRLPPVKVIQDKNLVQPLS 
DEHGEVLAEIRRSDLEPYLGLHYPATDIPQASRFLFMKNRVRMICDCRAIPVRVIQDKELRQPLS 


Mougeotia 

Ceratodon 

Selaginella 

Psilotum 

Adiantum 

Anemia 

Dryopteris 


370 380 390 400 410 420 

it it it it it it 

LAGSTMRGVHGCHTQYMMNMGSTASLVMCVTINDTNE-IAGGPGMKGRKLWGLIVCHHST 

LAGSTLRAPHGCHAQYMGNMGSIASLVMAVIINDNEE-Y SRG AIQRGRKLWGL WCQHTS 

LAGSTLRAPHGCHAQYMGNMGSVASLVMAMIINDNDE—PSGGGGGGGQHKGRRLWGLWCHHTS 

LAGSTLRAPHGCHAHYMGNMGSIASLVMAVIVKRHGEED-RSLGFQSQNGNRLWGMWCHHTT 

LTGSTLRAPHGCHTQYMANMNSISSLVMAVIVNDSDDD-SAGHSSQGIKLWGLWCHHTS 

LAGSTLRAPHRCHAEYMGNMGSIASLGMAVIVNDDDSSD-AGNMQQRTRLWGLWCHHTS 

LAGSTLRAPHGCHGQYMANMGSIASLVMAWVNDNDED-LSNRPHQPKMRRLWGLWCHHTT 


430 440 450 460 470 480 490 

it it it it it it it 

Mougeotia PRHIPFPIHSACEFLMQVFGLQLNMEAELAAQHREKHILRTQTLLCDMLLRDA-PMGIV SQS PN 
Ceratodon PRTVPFPLRSVCEFLMQVFGMQLNLHVELAAQLREKHILRTQTLLCDMLLRDA-PIGIVSQTPN 
Selaginella PRSVPF-LRSACEFLMQVFGLQLNMEAAVAAHVREKHILRTQTLLCDMLLRDA-PIGIVSQSPN 
Psilotum PRAVPFALRCACEFFAQVFALQLNMELELAAQMREKDILRTQSLLCDMLLRDA-PIGIVTRSPN 
Adiantum PRYVPFPVRSACEFLMQVFSLQLNMEVGMAAQVREKHILRTQTLLCDMLLRDA-PIGIVSQSPN 
Anemia TRYVPFPLRSACEFLMQVFSLELNMEVELAAQRREKHILQTQTLLCDMLLRDA-PIGIVSQSPN 

Dryopteris PRAVPFALRSACEFLMQVFGLQINMELELAAQMREKHILRTQTLLCDMLLRDA-PIGIVSESPN 

APPENDIX 2. Regions of all nonangiosperm amino acid sequences available that are homologous with the Mougeotia 
gene fragment, numbered with reference to Appendix 1. 'Winands et al. (1992); ^Thiimmler et al. (1992); ’Hanelt 
et al. (1992); 4 Okamoto et al. (1993); ^Maucher et al. (1992). 
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APPENDIX 3. Sources of PHY sequences determined in this study. Arrangement of flowering plants follows Cronquist 

(1981) and Polhill (1994). 



Subclass/Tribe 


Species 


Source/Voucher 


Sphenophyta 

Pinophyta 


Equisetum arvense L. 

Gingko biloba L. 

Pseudotsuga menziesii (Mirb.) 
Franco 


P. Soltis (no voucher) 

S. Mathews 365 MONT 
S. Mathews s.n. MONT 


Magnoliophyta 

Monocots 

Alismatidae 

Arecidae 

Commelinidae 


Zingiberidae 

Liliidae 

Dicots 

Magnoliidae 

Hamamelidae 

Caryophyllidae 

Dilleniidae 

Asteridae 

Rosidae 

Fabaceae 

Dalbergieae 

Galegeae 


Millettieae 


Robinieae 


Sophoreae 

Vicieae 


Elodea Michx. sp. 

Lemna gibba L. 

Hordeum vulgare L. 
Calamovilfa longifolia (Hook.) 
Scribn. 

Panicum capillare L. 

Billbergia nutans H. Wendl 
Muscari Mill. sp. 

Ceratophyllum denier sum L. 
Aquilegia L. sp. 

Urtica dioca L. 

Quercus turbinella Greene 
Dianthus caryophyllus L. 
Spinacia oleracea L. 
Arabidopsis thaliana (L.) Schur 
Ly coper sicon esculent urn Mill. 
Antirrhinum majus L. 

Daucus carota L. 


S. Mathews (no voucher) 

J. Silverthorne (no voucher) 
S. Mathews s.n. MONT 
Lavin s.n. MONT 

Lavin s.n. MONT 
S. Mathews 351 MON I 

S. Mathews (no voucher) 

S. Matheivs s.n. MONT 
S. Mathews (no voucher) 

S. Mathews 330 MONT 
J. M. Tucker 4491 UCD 

R. Woodson (no voucher) 

S. Mathews (no voucher) 

S. Mathews (no voucher) 

S. Mathews (no voucher) 

S. Mathews 301 MONT 

S. Mathews (no voucher) 


Dalbergia L.f. sp. 

Tipuana tipu (Benth.) Kuntze 
Caragana arborescens Lam. 
Clianthus formosus (G. Don) Ford 
& Vick 

Dalbergiella nyasae Baker f. 
Derris elliptica (Wallich) Benth. 


Kunstleria blackii (F. Muell.) Prain 
Lonchocarpus eriocarinalis Micheli 
Millettia dura Dunn 
Millettia richardiana (Baill.) D. J. 

Du Puy & J. Labat 
Piscidia piscipula (L.) Sarg. 

W isteria floribunda (W illd.) DG 
Xeroderris stuhlmanii (Taub.) Men- 
donca & E. P. Sousa 
Hebestigma cube rise (HBK) U rb. 
Lennea melanocarpa (Schltdl.) 

Vatke ex Harms 
Sesbania sesban (L.) Merr. 
Sesbania vesicaria (Jacq.) Elliot 
Myrospermum sousanum A. Delga¬ 
do & M. C. Johnston 
Lathyrus odoratus L. 


Lavin 7141 MONT 
Lavin 6184 BH 
Lavin 5907 RM 

Krukoff s.n. K 
Muller 2686 K 

Michigan State Univ. Conservatory (no 
voucher) 

Pedley 5005 K 
Lavin 5325a BH 
Lock 83/124 K 
Sclirire et al. 2555 K 


Lavin & Luckow 5793a TEX 
Lavin 6205 BH 
Corby 2162 K 

Lavin 5611 TEX 

Lavin & Delgado 8217 MEXU 

Potter 870410 BH 

Lavin s.n. TEX 

Delgado <£' Johnston s.n. TEX 

Lavin 6170 MONT 
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APPENDIX 5. Relative rate tests (Wu & Li, 1985) to detect rate asymmetry. * P < 0.05; ** P < 0.01. d ]3 

and d 23 are the number of nonsynonymous (or synonymous in legume comparisons) substitutions per site between 
species 1 and 3, and species 2 and 3, respectively; under the null hypothesis d l3 = d 23 . SE is standard error. 

Species 1 

Species 2 

Species 3 

(reference) 

^13 **23 

± SE 

ArabidopsisA 

Chromophore region only (330-594 bp) compared 
CucurbitaA Selaginella 

0.0082 ± 

0.0194 

ArabidopsisA 

SolanumA 

Selaginella 

0.0024 ± 

0.0415 

ArabidopsisA 

OryzaA 

Selaginella 

-0.0033 ± 

0.0416 

ArabidopsisA 

ArabidopsisB 

Selaginella 

0.0451 ± 

0.0395 

ArabidopsisA 

ArabidopsisC 

Selaginella 

0.0761 ± 

0.0379 

ArabidopsisA 

ArabidopsisD 

Selaginella 

0.0537 ± 

0.0390 

ArabidopsisA 

ArabidopsisE 

Selaginella 

0.0709 ± 

0.0387 

SolanumA 

SolanumB 

Selginella 

0.0847 ± 

0.0373* 

OryzaA 

OryzaB 

Selaginella 

0.0639 ± 

0.0387 

AvenaA 

ZeaA 

Selaginella 

-0.0081 ± 

0.0161 

MyrospermumA 

HebestigmaA 

PisumA 

0.0661 ± 

0.1679 

MilletiaA 

SesbaniaA 

MyrospermumA 

0.1337 ± 

0.1350 

PisumA 

HebestigmaA 

MyrospermumA 

0.2012 ± 

0.1466 

HebestigmaE 

MilletiaE 

MyrospermumE 

-0.0536 ± 

0.1002 

ArabidopsisA 

N-Terminal encoding sequence (2400 bp) compared 
CucurbitaA Selaginella 

0.0064 ± 

0.0216 

ArabidopsisA 

SolanumA 

Selaginella 

-0.0107 ± 

0.0220 

ArabidopsisA 

OryzaA 

Selaginella 

0.0053 ± 

0.0216 

ArabidopsisA 

/ArabidopsisB 

Selaginella 

0.0272 ± 

0.0253 

ArabidopsisA 

ArabidopsisC 

Selaginella 

-0.0115 ± 

0.0214 

ArabidopsisA 

ArabidopsisD 

Selaginella 

0.0182 ± 

0.0213 

ArabidopsisA 

ArabidopsisE 

Selaginella 

-0.0228 ± 

0.0225 

ArabidopsisB 

ArabidopsisC 

Selaginella 

-0.0387 ± 

0.0221 

ArabidopsisB 

ArabidopsisE 

Selaginella 

-0.0500 ± 

0.0218* 

ArabidopsisB 

SolanumB 

Selaginella 

0.0346 ± 

0.0195 

ArabidopsisB 

OryzaB 

Selaginella 

-0.0008 ± 

0.0204 

SolanumA 

SolanumB 

Selaginella 

0.0725 ± 

0.0205** 

OryzaA 

OryzaB 

Selaginella 

0.0211 ± 

0.0210 

ArabidopsisA 

Full-length 

CucurbitaA 

coding sequence (3384 bp) compared 

PisumA 

-0.0112 ± 

0.0111 

ArabidopsisA 

SolanumA 

Selaginella 

-0.0145 ± 

0.0199 

ArabidopsisA 

OryzaA 

Selaginella 

-0.0226 ± 

0.0202 

ArabidopsisA 

ArabidopsisB 

Selaginella 

0.0476 ± 

0.0187* 

ArabidopsisA 

ArabidopsisC 

Selaginella 

-0.0346 ± 

0.0206 

ArabidopsisA 

/ArabidopsisD 

Selaginella 

0.0339 ± 

0.0190 

ArabidopsisA 

ArabidopsisE 

Selaginella 

-0.0273 ± 

0.0204 

ArabidopsisE 

ArabidopsisB 

Selaginella 

0.0749 ± 

0.0194** 

ArabidopsisE 

ArabidopsisD 

Selaginella 

0.0612 ± 

0.0197** 

ArabidopsisC 

ArabidopsisB 

Selaginella 

0.0822 ± 

0.0194** 

ArabidopsisC 

ArabidopsisD 

Selaginella 

0.0685 ± 

0.0197** 

/ArabidopsisB 

SolanumB 

Selaginella 

0.0326 ± 

0.0234 

ArabidopsisB 

OryzaB 

Selaginella 

-0.0085 ± 

0.0179 

SolanumA 

SolanumB 

Selaginella 

0.0947 ± 

0.0183** 

OryzaA 

OryzaB 

Selaginella 

0.0617 ± 

0.0194** 








